3  Collaboration Using Git

Git has some amazing reproducible research capabilities that can become really powerful in large complicated analyses. That said, utilizing Git comes with an overhead that may not be justified for small projects unless you consider collaboration with future analysts including yourself. To demonstrate Git’s collaborative potential I created a remote repository on GitHub called git_practice.

3.0.1 Interacting with your Remote Repository

3.0.1.1 git push

To link your local repository to a remote repository use git remote. In the terminal session below I added a remote repository named “origin” and provided a URL where the repository is located.

amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice ({main})
$ git remote add origin https://github.com/adamreimer/git_practice.git

Then git push is used to “push” my local repository to my remote repository. Files associated with this repository are now stored in a location where they can be accessed by others for viewing, download, or used for collaborative work.

amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git push -u origin main
Enumerating objects: 11, done.
Counting objects: 100% (11/11), done.
Delta compression using up to 16 threads
Compressing objects: 100% (10/10), done.
Writing objects: 100% (11/11), 1.53 KiB | 8.00 KiB/s, done.
Total 11 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), done.
To https://github.com/adamreimer/git_practice.git
 * [new branch]      main -> main
branch 'main' set up to track 'origin/main'.

After pushing to github your repository now looks like Figure 3.1.

Figure 3.1: The Git workspace after your local repository has been pushed to a remote repository.

Now that we have a remote repository updated we have to worry about keeping them both synced. To illustrate this workflow I’ll change the fib_seq.r file by adding the fifth value to the Fibonacci sequence (fib_seq[5] <- fib_seq[3] + fib_seq[4]) as a new line. After this change the git work space will contain an unstaged change which is not reflected in either repository.

Figure 3.2: The Git workspace after the working directory has been changed leaving the local and remote repositories out-of-date.

In the terminal session below I stage the file fib_seq.R and commit the file. Notice that after a git status command we were told the local and remote repositories were synced but that there were unstaged changes.

amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   fib_seq.R

no changes added to commit (use "git add" and/or "git commit -a")

After the modified file was added,

amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git add fib_seq.R

and committed,

amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git commit -m "Fifth entry in the Fibonacci sequence" -m "A long and descriptive description"
[main 5139049] Fifth entry in the Fibonacci sequence
 1 file changed, 2 insertions(+), 1 deletion(-)

the second call to git status tells us our remote repository is one commit behind our local repository.

amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git status
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean

The git work space at this moment is illustrated by Figure 3.3.

Figure 3.3: The Git workspace after a local change has been staged & committed leaving the remote repositories one commit behind.

In the terminal session below I use git push to update the remote repository.

amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git push
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 16 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 414 bytes | 8.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
To https://github.com/adamreimer/git_practice.git
   0c92881..5139049  main -> main

Notice git status verifies the repositories are now synced.

amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git status
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

The git work space at this moment is illustrated by Figure 3.4.

Figure 3.4: The Git workspace after a local change has been staged, committed, and pushed.

3.0.1.2 git clone

Imagine a situation where you would like to work on your analysis from a home computer1. If your analysis is stored as a remote git repository it is easy to obtain a copy. In the terminal sequence I will obtain a copy of my repository in a new location (Note in the previous file paths I have been working on a network S drive). The first step is to switch switched to my computer’s C drive.

amreimer@DFGSXQDSF223076 MINGW64 ~/Documents
$ cd C:/

The command git clone copies (clones) the remote repository to my C drive.

amreimer@DFGSXQDSF223076 MINGW64 /c
$ git clone https://github.com/adamreimer/git_practice.git
Cloning into 'git_practice'...
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 14 (delta 3), reused 14 (delta 3), pack-reused 0
Receiving objects: 100% (14/14), done.
Resolving deltas: 100% (3/3), done.

After which I can navigate to the new local repository,

amreimer@DFGSXQDSF223076 MINGW64 /c
$ cd C:/git_practice

and check the repository status. Notice that I made a typo on the git status command the first time and nothing terrible happened.

amreimer@DFGSXQDSF223076 MINGW64 /c/git_practice (main)
$ git_status
bash: git_status: command not found
amreimer@DFGSXQDSF223076 MINGW64 /c/git_practice (main)
$ git status
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

After git clone-ing my remote repository to my C drive I have two local repositories associated with the same remote (see Figure 3.5).

Figure 3.5: The Git workspace when you have two local repositories associated with the same remote.

If I change the file fib_seq.R in the working directory of my C drive by adding a new line (fib_seq[6] <- fib_seq[4] + fib_seq[5]), stage and commit those changes, and push local repository on my C drive to the remote repository the local repository on my S drive to be behind one commit. The terminal session below demonstrates these commands (all of which we have seen before) and the current state of the Git work space is shown in Figure 3.6.

amreimer@DFGSXQDSF223076 MINGW64 /c/git_practice (main)
$ git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   fib_seq.R

no changes added to commit (use "git add" and/or "git commit -a")
amreimer@DFGSXQDSF223076 MINGW64 /c/git_practice (main)
$ git add fib_seq.R
amreimer@DFGSXQDSF223076 MINGW64 /c/git_practice (main)
$ git commit -m "Sixth number in the Fibonacci seqence" -m "This commit is slightly different as it was made from a different computer in my house. It still represents a single author working on their own repository but demonstrated the flexibility accorded by storing your analysis on the cloud. Working on this analysis from a new machine was seamless provided the new machine had the appropriate software."
[main 9db5478] Sixth number in the Fibonacci seqence
 1 file changed, 2 insertions(+), 1 deletion(-)
amreimer@DFGSXQDSF223076 MINGW64 /c/git_practice (main)
$ git status
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean
amreimer@DFGSXQDSF223076 MINGW64 /c/git_practice (main)
$ git push
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 16 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 586 bytes | 586.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
To https://github.com/adamreimer/git_practice.git
   5139049..9db5478  main -> main

Figure 3.6: The Git workspace when you have one local repository has pushed a new commit to the remote repository.

3.0.1.3 git pull

As Figure 3.6 demonstrates the local repository on my S drive is now one commit behind the remote (and the local repository on my C drive). In the terminal session below we try git status but are told the local and remote repositories are synced, which we know to be false.

amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git status
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

Git has lost track of the remote since the repository on the S drive was blind to the last commit. Instead we use git update to update the remote connection, after which git status works as before.

amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git remote update
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0
Unpacking objects: 100% (3/3), 566 bytes | 0 bytes/s, done.
From https://github.com/adamreimer/git_practice
   5139049..9db5478  main       -> origin/main
amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git status
On branch main
Your branch is behind 'origin/main' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)

nothing to commit, working tree clean

Finally, git pull brings the local repository on the S drive into sync with the remote (and the local repository on my C drive). At this point the local and remote repositories have the structure of Figure 3.5 but will include an additional commit (9db5478) not shown in Figure 3.5.

amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git pull
Updating 5139049..9db5478
Fast-forward
 fib_seq.R | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git status
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

3.0.2 Interacting with a Peer’s Remote Repository

How you interact with a peers remote repository depends on your goals. We will discuss three typical use cases below.

3.0.2.1 git clone - To Copy/Modify Code

Imagine a situation where a peer has written some code which you would like to modify for a similar project2. Use git clone as described above. You will be able to create a copy of their repository and work on your local machine as usual, but you will not be able to push changes back to the remote.

3.0.2.2 git clone, git push, git pull - To Collaborate (closely)

If you and a peer are working closely on an analysis it may be appropriate for the owner to add their peer as a collaborator to the project. This is a point-and-click task from your github repository page, Settings>Collaborators>Add people>(keypunch the username). The collaborator can push and pull changes to the remote as if they were the owner. This arrangement is only appropriate for peers who you trust to commit changes of which you both approve. In practice, this likely means there will be personal communication to coordinate each person’s efforts. To demonstrate this process I added my wife (Carly) as a collaborator to the git_practice repository. Carly then cloned the repository, modified the fib_seq.R file by adding a new line (fib_seq[7] <- fib_seq[5] + fib_seq[6]), staged the file, committed the changes, and pushed her local repository back to the git_practice remote. Afterwards I pulled those changes back to the local repository on my S drive. The terminal session and figures associated with these actions would closely mirror those shown for git clone, git push, and git pull above although the local repositories have different owners in this case. To demonstrate commits were made by both collaborators I ran a specially formatted call3 to git log which shows that the latest commit to this repository did come form a new author.

amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git log --pretty=format:"%h%x09%an%x09%ad%x09%s"
f732cdb Carly Reimer    Sun Jul 2 22:01:17 2023 -0800   Seventh Fibonacci number
9db5478 Adam Reimer     Sun Jul 2 20:39:19 2023 -0800   Sixth number in the Fibonacci seqence
5139049 Adam Reimer     Sun Jul 2 16:15:52 2023 -0800   Fifth entry in the Fibonacci sequence
0c92881 Adam Reimer     Sun Jul 2 14:53:05 2023 -0800   Fourth entry in the Fibonacci sequence
3bb6c98 Adam Reimer     Sun Jul 2 14:31:04 2023 -0800   Third entry in fib_seq
e17181f Adam Reimer     Sun Jul 2 12:59:06 2023 -0800   Initialize Fibonacci sequence

3.0.2.3 fork - To Collaborate (formally)

Fork is a GitHub operation which creates a copy of another user’s remote repository under your GitHub ID. After the fork is created you can clone it to a local repository as described above. Your local repository can be configured to sync with the original (upstream) repository so that your local repository can track changes the original author made after fork. If you make significant changes to the repository that the original author may be interested in you can submit a pull request which notifies the original author about the changes you have made and gives them the opportunity to include your code in the repository. Github has great documentation of this process.

As an example I revoked my wife’s collaborator status on the git_practice repository associated with my GitHub account. Carly then forked the git_practice repository in my account. Afterwards, the forked version of the git_practice repository in her account looked something like this:

The forked git_practice repository in Carly Reimer’s GitHub account

Using the same commands described above, Carly cloned the forked repository, made changes, added the changed file, committed the changes, and pushed the result back to her forked repository on Github. Pull requests are so named because Carly is asking me to pull her forked repository back into my original repository. To initiate a pull request the owner of the forked repository (Carly) navigates to the original repository and presses the Pull request button. The pull request looked like this when viewed from my account:

The pull request summary screen.

Navigating the the Files changed button allows the repository owner to review line by line changes associated with the pull request. In this case, I deemed the suggestions reasonable and accepted them without comment but there are capabilities to comments and modify the changes before they are accepted.

The pull request review/approval screen

After the request is approved the original owner can merge the pull request from within GitHub.

Merging a pull request

After merging the pull request; Carly’s local repository, the forked repository, and the original remote repository are synced while Adam’s local repository is behind. This situation could be fixed with git pull.

amreimer@DFGSXQDSF206801 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git pull
remote: Enumerating objects: 12, done.
remote: Counting objects: 100% (12/12), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 10 (delta 3), reused 9 (delta 3), pack-reused 0
Unpacking objects: 100% (10/10), 2.60 KiB | 1024 bytes/s, done.
From https://github.com/adamreimer/git_practice
   f732cdb..22dcfea  main       -> origin/main
Updating f732cdb..22dcfea
Fast-forward
 fib_seq.R | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)