✅ Why should you as a researcher should use Git and Github
✅ Why should you as a researcher should use Git and Github
❌ How to install and use Git and Github
holds battle-tested instructions honed over several years in STAT 545 at the University of British Columbia.
By using common mechanics across work modes (research, teaching, analysis), you achieve basic competence quickly and avoid the demoralizing forget-relearn cycle.
A user doesn't have to be actively pushing and pull code.
Git is a version control system.
Git manages the evolution of a set of files – called a repository or repo – in a sane, highly structured way.
It is like the “Track Changes” feature from Microsoft Word, but more rigorous, powerful, and scaled up to multiple files.
Its original purpose was to help groups of developers work collaboratively on big software projects.
Many people who don’t use Git unwittingly re-invent a poor man’s version of it.
Git is the "track changes" of Microsoft Word
GitHub is like DropBox or Google Drive
Git is the "track changes" of Microsoft Word
GitHub is like DropBox or Google Drive
It is easy to create a hyperlink to a specific file or location in a file, at a specific version, which can make meta-conversations about project code or reports much more productive.
GitHub also offers granular control over who can see, edit, and administer a project.
Think of the issues for a project as its bug tracker.
organize our to-do list more generally.
Issues can be assigned to specific people and they can be labeled, e.g. “bug”, “simulation- study”, or “final-exam”.
Coupled with the ability to cross-link issues and the project files or file changes, you have extraordinary power to document why things have happened in the past and to organize what needs to happen in the future.
For new or existing projects, you will:
This setup happens once per project and can happen at project inception or at any later point. Chances are your project already lives in a dedicated directory. Making this directory an RStudio Project and Git repository boils down to allowing those applications to leave notes for themselves in hidden files or directories. The project is still a regular directory on your computer, that you can locate, name, move, and generally interact with as you wish.
If you have ever versioned a file by adding your initials or the date, you have effectively made a commit, albeit only for a single file. It is a version that is significant to you and that you might want to inspect or revert to later.
– This is like sharing a document with colleagues on DropBox or sending it out as an email attachment. By pushing to GitHub, you make your work and all your accumulated progress accessible to others.
Recall that a repository or repo is just a directory of files that Git manages holistically.
A commit functions like a snapshot of all the files in the repo, at a specific moment.
Commits are the "most recent update v2" files you send via email.
Commits need a message to be included.
Doesn't need to be informative but is highly recommended.
https://github.com/EmilHvitfeldt/textdata/commit/e435125cb2e35615c0d14c5dd19bdbd54ed40c9f
A diff is the change that happened between commits.
Diff inspection is not limited to adjacent commits.
You can inspect the diffs between any two commits.
You can also designate certain snapshots as special with a tag, which is a name of your choosing.
Software: "v1.0.0", "v6.3.0"
Writing: "draft-1", "draft-2", "review-1", "publication"
Is it useful to someone? If so, track and share!
Will it play nicely with Git/GitHub?
✅ Small-to-medium plain text files with hard line breaks are ideal
✅ .csv and .tsv files
Will it actively cause problems with Git/GitHub?
❌ A file that is large and is changing often.
❌ A file is binary, such as a Word document or Excel spreadsheet.
❌❌❌ A large binary file that changes often.
Multiple people, including your past and future self.
Participants include
Everyone has one (or more!) copies of the document, which circulate as email attachments, accumulating initials and dates in the filename.
If you have multiple edits to the same file how do you fix it?
There is only one copy of the document and it lives in the cloud.
Anyone can edit or comment or propose a change and this is immediately available to everyone else.
Files are handled individually.
Git is a decentralized version control system, meaning each collaborator has its own complete copy of the repo and its history. Everyone can work offline and/or simultaneously.
You pull regularly from GitHub, to receive and integrate changes made by your collaborators.
The joke is that GitHub puts the “central” in decentralized version control.
But sometimes it’s not clear how to reconcile your changes with the new ones from GitHub and you get a merge conflict.
Merge conflicts are the most frustrating thing about using Git and GitHub.
At each location of conflict, you must pick one version or the other – or create a hybrid – and mark it as resolved.
Many Git clients have special tooling for this specific task, which can be very convenient.
Once you’ve resolved all conflicts, you will be able to finalize the merge and push a version integrating your recent changes to GitHub.
The best way to deal with merge conflicts is to prevent them.
Everyone should commit, pull, and push often.
Simply having a project on GitHub gives it a web presence!
Non-users of Git/GitHub can visit the project in the browser and interact with it like a webpage.
GitHub also offers several ways to host a proper website directly from a repository, collectively known as GitHub Pages.
Any file written in Markdown is rendered in an HTML-like way on GitHub.
files named README.md will be rendered as the starting page.
Continuous Integration
Github is not the only place. Bitbucket or GitLab
Many companies and even universities are starting to make GitHub Enterprise or GitLab available internally.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |