+ - 0:00:00
Notes for current slide
Notes for next slide

Git & Github

What is it and why should I use it?

Emil Hvitfeldt

2020-1-23

1 / 35

What will this presentation be about?

2 / 35

What will this presentation be about?

✅ Why should you as a researcher should use Git and Github

2 / 35

What will this presentation be about?

✅ Why should you as a researcher should use Git and Github

❌ How to install and use Git and Github

2 / 35

happygitwithr.com

holds battle-tested instructions honed over several years in STAT 545 at the University of British Columbia.

3 / 35

4 / 35

Why should you (researcher) use Git? (and Github)

  • Collaboration is much more structured with powerful tools for asynchronous work and managing versions.
  • Web presence comes with minimal effort.
  • GitHub makes a fantastic course management system for courses that use R.
5 / 35

Why should you (researcher) use Git? (and Github)

By using common mechanics across work modes (research, teaching, analysis), you achieve basic competence quickly and avoid the demoralizing forget-relearn cycle.

A user doesn't have to be actively pushing and pull code.

6 / 35

What is Git?

Git is a version control system.

Git manages the evolution of a set of files – called a repository or repo – in a sane, highly structured way.

It is like the “Track Changes” feature from Microsoft Word, but more rigorous, powerful, and scaled up to multiple files.

Its original purpose was to help groups of developers work collaboratively on big software projects.

7 / 35

What is Git?

Many people who don’t use Git unwittingly re-invent a poor man’s version of it.

8 / 35

What is GitHub?

  • Git is a powerful structure for file management.
  • GitHub complements Git by providing a slick user interface and distribution mechanism for Git repositories.
  • Git is the software you will use locally to record changes to a set of files.
  • GitHub is a hosting service that provides a Git-aware home for such projects on the internet.
9 / 35

What is GitHub?

Git is the "track changes" of Microsoft Word

GitHub is like DropBox or Google Drive

10 / 35

What is GitHub?

Git is the "track changes" of Microsoft Word

GitHub is like DropBox or Google Drive

It is easy to create a hyperlink to a specific file or location in a file, at a specific version, which can make meta-conversations about project code or reports much more productive.

GitHub also offers granular control over who can see, edit, and administer a project.

10 / 35

GitHub Issues

Think of the issues for a project as its bug tracker.

organize our to-do list more generally.

Issues can be assigned to specific people and they can be labeled, e.g. “bug”, “simulation- study”, or “final-exam”.

Coupled with the ability to cross-link issues and the project files or file changes, you have extraordinary power to document why things have happened in the past and to organize what needs to happen in the future.

11 / 35

Initial Setup

  • Register for a free account with GitHub.
  • Install Git.
  • (optional) Install a local Git client.
  • Confirm, with a practice repository, that local Git can send and receive the current version of the repository on GitHub, known as pushing and pulling, respectively.
12 / 35

Repositories and Workflow

For new or existing projects, you will:

  • Dedicate a local directory or folder to it.
  • (Optional) Make it an RStudio Project.
  • Make it a Git repository.
13 / 35

This setup happens once per project and can happen at project inception or at any later point. Chances are your project already lives in a dedicated directory. Making this directory an RStudio Project and Git repository boils down to allowing those applications to leave notes for themselves in hidden files or directories. The project is still a regular directory on your computer, that you can locate, name, move, and generally interact with as you wish.

Daily Workflow

  • Go about your usual business.
  • But instead of only saving individual files, periodically you make a commit.
  • Push commits to GitHub periodically.
14 / 35

If you have ever versioned a file by adding your initials or the date, you have effectively made a commit, albeit only for a single file. It is a version that is significant to you and that you might want to inspect or revert to later.

– This is like sharing a document with colleagues on DropBox or sending it out as an email attachment. By pushing to GitHub, you make your work and all your accumulated progress accessible to others.

Commits, Diffs, and Tags

Recall that a repository or repo is just a directory of files that Git manages holistically.

A commit functions like a snapshot of all the files in the repo, at a specific moment.

Commits are the "most recent update v2" files you send via email.

15 / 35

Commits, Diffs, and Tags

Commits need a message to be included.

Doesn't need to be informative but is highly recommended.

16 / 35

Commits, Diffs, and Tags

17 / 35

Commits, Diffs, and Tags

18 / 35

Commits, Diffs, and Tags

A diff is the change that happened between commits.

Diff inspection is not limited to adjacent commits.

You can inspect the diffs between any two commits.

20 / 35

Commits, Diffs, and Tags

You can also designate certain snapshots as special with a tag, which is a name of your choosing.

Software: "v1.0.0", "v6.3.0"

Writing: "draft-1", "draft-2", "review-1", "publication"

21 / 35

Which files to commit

Is it useful to someone? If so, track and share!

Will it play nicely with Git/GitHub?

✅ Small-to-medium plain text files with hard line breaks are ideal

✅ .csv and .tsv files

22 / 35

Which files to commit

Will it actively cause problems with Git/GitHub?

❌ A file that is large and is changing often.

❌ A file is binary, such as a Word document or Excel spreadsheet.

❌❌❌ A large binary file that changes often.

23 / 35

Collaboration

Multiple people, including your past and future self.

Participants include

  • Active workers (people who write stuff)
  • People who only read and review
24 / 35

Collaboration - Edit, save, attach

Everyone has one (or more!) copies of the document, which circulate as email attachments, accumulating initials and dates in the filename.

If you have multiple edits to the same file how do you fix it?

25 / 35

Collaboration - Google Doc.

There is only one copy of the document and it lives in the cloud.

Anyone can edit or comment or propose a change and this is immediately available to everyone else.

Files are handled individually.

26 / 35

Collaboration - Git & Github

Git is a decentralized version control system, meaning each collaborator has its own complete copy of the repo and its history. Everyone can work offline and/or simultaneously.

You pull regularly from GitHub, to receive and integrate changes made by your collaborators.

The joke is that GitHub puts the “central” in decentralized version control.

27 / 35

Collaboration - Git & Github conflicts

But sometimes it’s not clear how to reconcile your changes with the new ones from GitHub and you get a merge conflict.

Merge conflicts are the most frustrating thing about using Git and GitHub.

28 / 35

Collaboration - Git & Github conflicts

At each location of conflict, you must pick one version or the other – or create a hybrid – and mark it as resolved.

Many Git clients have special tooling for this specific task, which can be very convenient.

Once you’ve resolved all conflicts, you will be able to finalize the merge and push a version integrating your recent changes to GitHub.

29 / 35

Collaboration - Git & Github conflicts

The best way to deal with merge conflicts is to prevent them.

Everyone should commit, pull, and push often.

30 / 35

GitHub as a Web Presence

Simply having a project on GitHub gives it a web presence!

Non-users of Git/GitHub can visit the project in the browser and interact with it like a webpage.

GitHub also offers several ways to host a proper website directly from a repository, collectively known as GitHub Pages.

31 / 35

Markdown is special on GitHub

Any file written in Markdown is rendered in an HTML-like way on GitHub.

files named README.md will be rendered as the starting page.

32 / 35

Extra Things You Get with minimal Effort

Continuous Integration

  • Travis-CI
  • Appveyor
  • Azure Pipelines
  • Circle CI
  • GitHub Actions
33 / 35

Extra Things You Get with minimal Effort

  • Codecov
  • Code checking
  • Webhosting
34 / 35

Last Things

Github is not the only place. Bitbucket or GitLab

Many companies and even universities are starting to make GitHub Enterprise or GitLab available internally.

35 / 35

What will this presentation be about?

2 / 35
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow