Feel free to interrupt me and ask questions anytime!
There are numerous sources online to help you learn more about git and how to use it. Some examples:
There is an endless amount of information out there that can help you learn the basics.
This presentation is mostly based on git-scm book.
The tool was born in 2005 as the need for an open source solution to allow linux kernel development and be more efficient than commercial applications.
It is a version control system (VCS) that allows to track the history of files and enables people to work together on the same code.
We use distributed VCS, e.g. Gitlab, GitHub
Every clone is really a backup of all the data.
Repositories can have multiple remotes.
Changes of files (deltas) each time?
No.
Git basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot.
To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored.
Git thinks about its data more like a stream of snapshots.
As compared to all other VCS.
Think of it as a mini file system. This is also a reason, why it’s fast!
You need to install git.
You need to configure git.
You need to have credentials (gitlab-ssh-key)
Since we work with distributed VCS, let’s first create the repository on GitLab (Personal/Group, Private/Internal/Public) and secondly add files.
What we are going to do in the next few slides:
git init
git add
git log
or git diff
git rm
or git restore
git commit
The first step in working with git is to add files.
The next step is to modify these files .
$ vim CONTRIBUTING.md
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: README
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: CONTRIBUTING.md
Then add these modified files again.
Let’s modify it again.
$ vim CONTRIBUTING.md
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
new file: README
modified: CONTRIBUTING.md
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: CONTRIBUTING.md
Both modified and not staged?
run git add CONTRIBUTING.md
again.
$ git diff
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 8ebb991..643e24f 100644 a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -65,7 +65,8 @@ branch directly, things can get messy.
Please include a nice description of your changes when you submit your PR;
if we have to read the whole diff to figure out why you're contributing
in the first place, you're less likely to get feedback and have your change
-merged in.
+merged in. Also, split your changes into comprehensive chunks if your patch is
+longer than a dozen lines.
This command compares what is in your working directory and what is in your staging area.
Run git diff --staged
to check what changes are in your staging area as compared to your last commit.
Finally.
$ git commit -m "Some fixes."
[master 463dc4f] Some fixes.
2 files changed, 2 insertions(+)
create mode 100644 README
Now you’ve created your first commit! You can see that the commit has given you some output about itself: which branch you committed to (master), what SHA-1 checksum the commit has (463dc4f
), how many files were changed, and statistics about lines added and removed in the commit.
There are some details when you commit (editor, message):
$ git commit
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch main
# Changes to be committed:
# (use "git reset HEAD ..." to unstage)
#
#modified: README
$ git commit -a -m "Fast, add all known file changes."
This allows to stage and commit in one command.
Try to commit often. Every commit is a state that you want to record and can return to. Try not to create too large commits and be descriptive on your message: “Fixed typo in README”
$ rm PROJECTS.md
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
deleted: PROJECTS.md
no changes added to commit (use "git add" and/or "git commit -a")
Next time you commit, that file will be deleted. If staged before, run: git rm -f README
. You can use wildcards, e.g. git rm log/\*.log
Useful commands to look into the change log of a repository
# show full log
$ git log
# show the patch of the last two logs
$ git log -p -2
# statistics
$ git log --stat
# very short
$ git log --pretty=oneline
# who, when and what?
$ git log --pretty=format:"%h - %an, %ar : %s"
# ascii graph (e.g. branches)
$ git log --pretty=format:"%h %s" --graph
# limit by time (--since, --until) use a date: 2025-01-23
$ git log --since=2.weeks
# filter by path (changes to this file)
$ git log -- path/to/file
One of the common undos takes place when you commit too early and possibly forget to add some files, or you mess up your commit message.
$ git commit -m 'Initial commit'
$ git add forgotten_file
# amend to the rescue!
$ git commit --amend
You end up with a single commit — the second commit replaces the results of the first.
You accidentally added a bunch of files. How to unstage this mess?
# can apply it to staged file too! (unstage)
# git reset HEAD CONTRIBUTING.md
$ git restore --staged CONTRIBUTING.md
...
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: CONTRIBUTING.md
What if you realize that you don’t want to keep your changes to the CONTRIBUTING.md
file?
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: CONTRIBUTING.md
# Need to revert it back
$ git checkout -- CONTRIBUTING.md
# or git restore CONTRIBUTING.md
$ git status
On branch master
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
renamed: README.md -> README
Remember, anything that is committed in Git can almost always be recovered. However, anything you lose that was never committed is likely never to be seen again.
There might be files, that you do not want git to track and make sure that you do not add them by mistake. There is a secret file .gitignore
which lives in your project root.
Standard glob patterns work, and will be applied recursively throughout the entire working tree.
# ignore all .a files
*.a
# but do track lib.a, even though you're ignoring .a files above
!lib.a
# ignore all files in any directory named build
build/
# ignore doc/notes.txt, but not doc/server/arch.txt
doc/*.txt
# ignore all .pdf files in the doc/ directory and any of its subdirectories
doc/**/*.pdf
We talked about:
What we are going to do in the next few slides:
git remote
git tag
(releases)git branch
, git merge
Because we use distributed VCS such as gitlab.
# I use my SSH Key to access the repo
$ git clone git@gitlab.phaidra.org:imgw/gitlab-tutorial.git
Cloning into 'gitlab-tutorial'...
remote: Enumerating objects: 23, done.
remote: Total 23 (delta 0), reused 0 (delta 0), pack-reused 23 (from 1)
Receiving objects: 100% (23/23), 19.74 KiB | 19.74 MiB/s, done.
Resolving deltas: 100% (5/5), done.
$ cd gitlab-tutorial
$ git remote -v
origin git@gitlab.phaidra.org:imgw/gitlab-tutorial.git (fetch)
origin git@gitlab.phaidra.org:imgw/gitlab-tutorial.git (push)
# or maybe have more
$ git remote -v
origin git@gitlab.phaidra.org:imgw/gitlab-tutorial.git (fetch)
origin git@gitlab.phaidra.org:imgw/gitlab-tutorial.git (push)
upstream https://github.com/mblaschek/gitlab-tutorial.git (fetch)
upstream https://github.com/mblaschek/gitlab-tutorial.git (push)
Working with remotes:
# add the above remote on GitHub
$ git remote add upstream https://github.com/mblaschek/gitlab-tutorial.git
# fetch (retrieve) the information on that remote
$ git fetch upstream
# send (push) changes to a remote and a branch, origin master (default)
$ git push <remote> <branch>
# new branches need to set an upstream
$ git push --set-upstream origin <branch>
# inspect a remote (url, branches)
$ git remote show origin
Git has the ability to tag specific points in a repository’s history as being important. Typically, people use this functionality to mark release points (v1.0, v2.0 and so on)
A branch is essentially a pointer to a specific commit in the repository’s history.
Branch v1.0
and master
point to the same commit.
We add a new branch testing
(Notice HEAD)
Start working on the testing branch…
$ git checkout -b hotfix
Switched to a new branch 'hotfix'
$ vim index.html
$ git commit -a -m 'Fix broken email address'
[hotfix 1fb7853] Fix broken email address
1 file changed, 2 insertions(+)
$ git checkout master
$ git merge hotfix
Updating f42c576..3a0874c
Fast-forward
index.html | 2 ++
1 file changed, 2 insertions(+)
$ git checkout iss53
Switched to branch "iss53"
$ vim index.html
$ git commit -a -m 'Finish the new footer [issue 53]'
[iss53 ad82d7a] Finish the new footer [issue 53]
1 file changed, 1 insertion(+)
$ git checkout master
Switched to branch 'master'
$ git merge iss53
Merge made by the 'recursive' strategy.
index.html | 1 +
1 file changed, 1 insertion(+)
$ git merge iss53
Auto-merging index.html
CONFLICT (content): Merge conflict in index.html
Automatic merge failed; fix conflicts and then commit the result.
$ git status
On branch master
You have unmerged paths.
(fix conflicts and run "git commit")
Unmerged paths:
(use "git add <file>..." to mark resolution)
both modified: index.html
no changes added to commit (use "git add" and/or "git commit -a")
The index.html
shows this:
<<<<<<< HEAD:index.html
<div id="footer">contact : email.support@github.com</div>
=======
<div id="footer">
please contact us at support@github.com
</div>
>>>>>>> iss53:index.html
Which indicates what is in your current HEAD and what is coming from the branch. As these different commits edit the same line, there is a merge conflict, that you as a user need to fix.
The resolution is to edit the file as you want it, removing the <<<<<<
, >>>>>>
and =======
parts and e.g.
To resolve the conflict run git add index.html
and git commit
to complete the merge.
We talked about:
Topics that we did not cover: git rebase
docs
We will continue with the following topics:
The University Library and the ZID have a long term archive for data, called PHAIDRA (Permanent Hosting, Archiving and Indexing of Digital Resources and Assets)
Part of this service is the GitLab instance: gitlab.phaidra.org as well as the Mattermost chat: discuss.phaidra.org
You need a separate account, but you can keep that account forever (requires: valid mail address)
Features:
It is possible in GitLab to create releases
for your software, if that is part of your publication.
These releases can the be linked or stored in the PHAIDRA catalog. As this is one of many trusted repositories (re3data) you can get a DOI and apply special licenses to your software. Assets in the catalog can be versioned and linked directly to GitLab.
You can set up some CI/CD yourself, but Gitlab will automatically do this for you:
https://<github username>@github.com/path/to/your/repo.git
The best Microsoft product: Visual Studio Code
Useful extensions:
It is critical that you start using git on your scientific code. You might not be a developer or even need to foreseeable share your code with someone, but this will help you.
How to structure my thoughts and my code?
There is the motivation, to create a template master thesis project, which can serve as the first step to make sure that information created by students and others can become easier to reuse.
There are some rules:
#
)Jupyter Notebooks
with example plotsREADME.md
requirements.txt
pytest
)There is a clear benefit if people work together on similar topics to share code. Since you are not code developers, the idea is to write python code that follows a few simple rules:
#
docstrings
)main.py, reading.py, calculation.py
str, float, int
) and defaults (directory:str='/jetfs/...'
)pigar
to create the requirements.txt
This can be useful for automated testing or building packages or deployment to websites. There is a clear structure of how things work in gitlab:
Note
There are some examples on gitlab in the IMGW group. imgw/example-ci
What we talked about:
Git and Workflows(s)