amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice ({main}) $ git remote add origin https://github.com/adamreimer/git_practice.git
3 Collaboration Using Git
Git has some amazing reproducible research capabilities that can become really powerful in large complicated analyses. That said, utilizing Git comes with an overhead that may not be justified for small projects unless you consider collaboration with future analysts including yourself. To demonstrate Git’s collaborative potential I created a remote repository on GitHub called git_practice.
3.0.1 Interacting with your Remote Repository
3.0.1.1 git push
To link your local repository to a remote repository use git remote
. In the terminal session below I added a remote repository named “origin” and provided a URL where the repository is located.
Then git push
is used to “push” my local repository to my remote repository. Files associated with this repository are now stored in a location where they can be accessed by others for viewing, download, or used for collaborative work.
amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git push -u origin main
Enumerating objects: 11, done.
Counting objects: 100% (11/11), done.
Delta compression using up to 16 threads
Compressing objects: 100% (10/10), done.
Writing objects: 100% (11/11), 1.53 KiB | 8.00 KiB/s, done.
Total 11 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), done.
To https://github.com/adamreimer/git_practice.git
* [new branch] main -> main branch 'main' set up to track 'origin/main'.
After pushing to github your repository now looks like Figure 3.1.
Now that we have a remote repository updated we have to worry about keeping them both synced. To illustrate this workflow I’ll change the fib_seq.r
file by adding the fifth value to the Fibonacci sequence (fib_seq[5] <- fib_seq[3] + fib_seq[4]
) as a new line. After this change the git work space will contain an unstaged change which is not reflected in either repository.
In the terminal session below I stage the file fib_seq.R
and commit the file. Notice that after a git status
command we were told the local and remote repositories were synced but that there were unstaged changes.
amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git status
On branch main
Your branch is up to date with 'origin/main'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: fib_seq.R
no changes added to commit (use "git add" and/or "git commit -a")
After the modified file was added,
amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main) $ git add fib_seq.R
and committed,
amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git commit -m "Fifth entry in the Fibonacci sequence" -m "A long and descriptive description"
[main 5139049] Fifth entry in the Fibonacci sequence 1 file changed, 2 insertions(+), 1 deletion(-)
the second call to git status
tells us our remote repository is one commit behind our local repository.
amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git status
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
(use "git push" to publish your local commits)
nothing to commit, working tree clean
The git work space at this moment is illustrated by Figure 3.3.
In the terminal session below I use git push
to update the remote repository.
amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git push
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 16 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 414 bytes | 8.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
To https://github.com/adamreimer/git_practice.git 0c92881..5139049 main -> main
Notice git status
verifies the repositories are now synced.
amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
The git work space at this moment is illustrated by Figure 3.4.
3.0.1.2 git clone
Imagine a situation where you would like to work on your analysis from a home computer1. If your analysis is stored as a remote git repository it is easy to obtain a copy. In the terminal sequence I will obtain a copy of my repository in a new location (Note in the previous file paths I have been working on a network S drive). The first step is to switch switched to my computer’s C drive.
amreimer@DFGSXQDSF223076 MINGW64 ~/Documents $ cd C:/
The command git clone
copies (clones) the remote repository to my C drive.
amreimer@DFGSXQDSF223076 MINGW64 /c
$ git clone https://github.com/adamreimer/git_practice.git
Cloning into 'git_practice'...
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 14 (delta 3), reused 14 (delta 3), pack-reused 0
Receiving objects: 100% (14/14), done. Resolving deltas: 100% (3/3), done.
After which I can navigate to the new local repository,
amreimer@DFGSXQDSF223076 MINGW64 /c $ cd C:/git_practice
and check the repository status. Notice that I made a typo on the git status
command the first time and nothing terrible happened.
amreimer@DFGSXQDSF223076 MINGW64 /c/git_practice (main)
$ git_status bash: git_status: command not found
amreimer@DFGSXQDSF223076 MINGW64 /c/git_practice (main)
$ git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
After git clone
-ing my remote repository to my C drive I have two local repositories associated with the same remote (see Figure 3.5).
If I change the file fib_seq.R
in the working directory of my C drive by adding a new line (fib_seq[6] <- fib_seq[4] + fib_seq[5]
), stage and commit those changes, and push local repository on my C drive to the remote repository the local repository on my S drive to be behind one commit. The terminal session below demonstrates these commands (all of which we have seen before) and the current state of the Git work space is shown in Figure 3.6.
amreimer@DFGSXQDSF223076 MINGW64 /c/git_practice (main)
$ git status
On branch main
Your branch is up to date with 'origin/main'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: fib_seq.R
no changes added to commit (use "git add" and/or "git commit -a")
amreimer@DFGSXQDSF223076 MINGW64 /c/git_practice (main) $ git add fib_seq.R
amreimer@DFGSXQDSF223076 MINGW64 /c/git_practice (main)
$ git commit -m "Sixth number in the Fibonacci seqence" -m "This commit is slightly different as it was made from a different computer in my house. It still represents a single author working on their own repository but demonstrated the flexibility accorded by storing your analysis on the cloud. Working on this analysis from a new machine was seamless provided the new machine had the appropriate software."
[main 9db5478] Sixth number in the Fibonacci seqence 1 file changed, 2 insertions(+), 1 deletion(-)
amreimer@DFGSXQDSF223076 MINGW64 /c/git_practice (main)
$ git status
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
(use "git push" to publish your local commits)
nothing to commit, working tree clean
amreimer@DFGSXQDSF223076 MINGW64 /c/git_practice (main)
$ git push
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 16 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 586 bytes | 586.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
To https://github.com/adamreimer/git_practice.git 5139049..9db5478 main -> main
3.0.1.3 git pull
As Figure 3.6 demonstrates the local repository on my S drive is now one commit behind the remote (and the local repository on my C drive). In the terminal session below we try git status
but are told the local and remote repositories are synced, which we know to be false.
amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
Git has lost track of the remote since the repository on the S drive was blind to the last commit. Instead we use git update
to update the remote connection, after which git status works as before.
amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git remote update
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0
Unpacking objects: 100% (3/3), 566 bytes | 0 bytes/s, done.
From https://github.com/adamreimer/git_practice 5139049..9db5478 main -> origin/main
amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git status
On branch main
Your branch is behind 'origin/main' by 1 commit, and can be fast-forwarded.
(use "git pull" to update your local branch)
nothing to commit, working tree clean
Finally, git pull
brings the local repository on the S drive into sync with the remote (and the local repository on my C drive). At this point the local and remote repositories have the structure of Figure 3.5 but will include an additional commit (9db5478
) not shown in Figure 3.5.
amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git pull
Updating 5139049..9db5478
Fast-forward
fib_seq.R | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
3.0.2 Interacting with a Peer’s Remote Repository
How you interact with a peers remote repository depends on your goals. We will discuss three typical use cases below.
3.0.2.1 git clone - To Copy/Modify Code
Imagine a situation where a peer has written some code which you would like to modify for a similar project2. Use git clone
as described above. You will be able to create a copy of their repository and work on your local machine as usual, but you will not be able to push changes back to the remote.
3.0.2.2 git clone, git push, git pull - To Collaborate (closely)
If you and a peer are working closely on an analysis it may be appropriate for the owner to add their peer as a collaborator to the project. This is a point-and-click task from your github repository page, Settings>Collaborators>Add people>(keypunch the username). The collaborator can push and pull changes to the remote as if they were the owner. This arrangement is only appropriate for peers who you trust to commit changes of which you both approve. In practice, this likely means there will be personal communication to coordinate each person’s efforts. To demonstrate this process I added my wife (Carly) as a collaborator to the git_practice repository. Carly then cloned the repository, modified the fib_seq.R
file by adding a new line (fib_seq[7] <- fib_seq[5] + fib_seq[6]
), staged the file, committed the changes, and pushed her local repository back to the git_practice remote. Afterwards I pulled those changes back to the local repository on my S drive. The terminal session and figures associated with these actions would closely mirror those shown for git clone
, git push
, and git pull
above although the local repositories have different owners in this case. To demonstrate commits were made by both collaborators I ran a specially formatted call3 to git log
which shows that the latest commit to this repository did come form a new author.
amreimer@DFGSXQDSF223076 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git log --pretty=format:"%h%x09%an%x09%ad%x09%s"
f732cdb Carly Reimer Sun Jul 2 22:01:17 2023 -0800 Seventh Fibonacci number
9db5478 Adam Reimer Sun Jul 2 20:39:19 2023 -0800 Sixth number in the Fibonacci seqence
5139049 Adam Reimer Sun Jul 2 16:15:52 2023 -0800 Fifth entry in the Fibonacci sequence
0c92881 Adam Reimer Sun Jul 2 14:53:05 2023 -0800 Fourth entry in the Fibonacci sequence
3bb6c98 Adam Reimer Sun Jul 2 14:31:04 2023 -0800 Third entry in fib_seq e17181f Adam Reimer Sun Jul 2 12:59:06 2023 -0800 Initialize Fibonacci sequence
3.0.2.3 fork - To Collaborate (formally)
Fork is a GitHub operation which creates a copy of another user’s remote repository under your GitHub ID. After the fork is created you can clone it to a local repository as described above. Your local repository can be configured to sync with the original (upstream) repository so that your local repository can track changes the original author made after fork. If you make significant changes to the repository that the original author may be interested in you can submit a pull request which notifies the original author about the changes you have made and gives them the opportunity to include your code in the repository. Github has great documentation of this process.
As an example I revoked my wife’s collaborator status on the git_practice repository associated with my GitHub account. Carly then forked the git_practice repository in my account. Afterwards, the forked version of the git_practice repository in her account looked something like this:
Using the same commands described above, Carly cloned the forked repository, made changes, added the changed file, committed the changes, and pushed the result back to her forked repository on Github. Pull requests are so named because Carly is asking me to pull her forked repository back into my original repository. To initiate a pull request the owner of the forked repository (Carly) navigates to the original repository and presses the Pull request button. The pull request looked like this when viewed from my account:
Navigating the the Files changed button allows the repository owner to review line by line changes associated with the pull request. In this case, I deemed the suggestions reasonable and accepted them without comment but there are capabilities to comments and modify the changes before they are accepted.
After the request is approved the original owner can merge the pull request from within GitHub.
After merging the pull request; Carly’s local repository, the forked repository, and the original remote repository are synced while Adam’s local repository is behind. This situation could be fixed with git pull
.
amreimer@DFGSXQDSF206801 MINGW64 /s/RTS/Reimer/Research_Best_Practices/git_practice (main)
$ git pull
remote: Enumerating objects: 12, done.
remote: Counting objects: 100% (12/12), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 10 (delta 3), reused 9 (delta 3), pack-reused 0
Unpacking objects: 100% (10/10), 2.60 KiB | 1024 bytes/s, done.
From https://github.com/adamreimer/git_practice
f732cdb..22dcfea main -> origin/main
Updating f732cdb..22dcfea
Fast-forward
fib_seq.R | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)