Git and Workflow(s)

Michael Blaschek

What to expect?

  • Introduction to git 🚀
  • Important git commands
  • Gitlab and GitHub
  • Workflows, Phaidra
  • Continuous Integration

Feel free to interrupt me and ask questions anytime!

Resources

There are numerous sources online to help you learn more about git and how to use it. Some examples:

There is an endless amount of information out there that can help you learn the basics.

This presentation is mostly based on git-scm book.

git?

The tool was born in 2005 as the need for an open source solution to allow linux kernel development and be more efficient than commercial applications.

What is git?

It is a version control system (VCS) that allows to track the history of files and enables people to work together on the same code.

What is git?

We use distributed VCS, e.g. Gitlab, GitHub

Every clone is really a backup of all the data.

Repositories can have multiple remotes.

How does git work?

Changes of files (deltas) each time?

No.

Git basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot.

To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored.

Git thinks about its data more like a stream of snapshots.

How does git work?

As compared to all other VCS.

Think of it as a mini file system. This is also a reason, why it’s fast!

How git handles changes?

  • works offline on your local files
  • creates checksums of your files
  • almost all things can be undone :) (commit?)
  • steps: add, stage, commit, modify,…

Getting started with git

  • You need to install git.

  • You need to configure git.

    $ git config --global user.name "Wind Cloudy"
    $ git config --global user.email wcloudy@univie.ac.at
    # set that you want to merge conflicts.
    $ git config pull.rebase false
    # set your default editor
    $ git config core.editor [vim/nano/...]
    # edit your Configuration
    $ git config -e
  • You need to have credentials (gitlab-ssh-key)

    # Create a separate key for git (for example)
    ssh-keygen -t ed25519 -C "wcloudy@univie.ac.at" -f $HOME/.ssh/id_git
    # Store or cache (in .git-credentials)
    # or use an personal access token (can only be used for git operations)
    git config --global credential.helper store

Getting Started with git

Since we work with distributed VCS, let’s first create the repository on GitLab (Personal/Group, Private/Internal/Public) and secondly add files.

on your local computer add an existing directory

cd existing-project/
git init
git remote add origin git@gitlab.phaidra.org:[group/user]/[Name].git
git add .
git commit -m "Initial commit"
git push -u origin master

or just clone the empty one

git clone git@gitlab.phaidra.org:[group/user]/[Name]
cd [Name]

Git Basics - Outline

What we are going to do in the next few slides:

  1. Configuration of git (name, email, credentials)
  2. Initalizing a git repo: git init
  3. Adding files (new, modified), git add
  4. Looking at differences, Logs: git log or git diff
  5. Changing files (unstage, remove, restore), git rm or git restore
  6. Commiting files: git commit

Git Basics - Add

The first step in working with git is to add files.

$ git add README
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)

    new file:   README
# short
$ git status -s
A  README

Git Basics - Modify

The next step is to modify these files .

$ vim CONTRIBUTING.md
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)

    new file:   README
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   CONTRIBUTING.md

Git Basics - Add

Then add these modified files again.

$ git add CONTRIBUTING.md
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    new file:   README
    modified:   CONTRIBUTING.md

Git Basics - Modify

Let’s modify it again.

$ vim CONTRIBUTING.md
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    new file:   README
    modified:   CONTRIBUTING.md

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   CONTRIBUTING.md

Both modified and not staged?

run git add CONTRIBUTING.md again.

Git Basics - Diff

$ git diff
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 8ebb991..643e24f 100644 a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -65,7 +65,8 @@ branch directly, things can get messy.
 Please include a nice description of your changes when you submit your PR;
 if we have to read the whole diff to figure out why you're contributing
 in the first place, you're less likely to get feedback and have your change
-merged in.
+merged in. Also, split your changes into comprehensive chunks if your patch is
+longer than a dozen lines.

This command compares what is in your working directory and what is in your staging area.

Run git diff --staged to check what changes are in your staging area as compared to your last commit.

Git Basics - Commit

Finally.

$ git commit -m "Some fixes."
[master 463dc4f] Some fixes.
 2 files changed, 2 insertions(+)
 create mode 100644 README

Now you’ve created your first commit! You can see that the commit has given you some output about itself: which branch you committed to (master), what SHA-1 checksum the commit has (463dc4f), how many files were changed, and statistics about lines added and removed in the commit.

Git Basics - Commit

There are some details when you commit (editor, message):

$ git commit 
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch main
# Changes to be committed:
# (use "git reset HEAD ..." to unstage)
#
#modified: README
$ git commit -a -m "Fast, add all known file changes."

This allows to stage and commit in one command.

Try to commit often. Every commit is a state that you want to record and can return to. Try not to create too large commits and be descriptive on your message: “Fixed typo in README”

Git Basics - Remove

$ rm PROJECTS.md
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)
        deleted:    PROJECTS.md
no changes added to commit (use "git add" and/or "git commit -a")
$ git rm PROJECTS.md
rm 'PROJECTS.md'
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)
    deleted:    PROJECTS.md

Next time you commit, that file will be deleted. If staged before, run: git rm -f README. You can use wildcards, e.g. git rm log/\*.log

Git Basics - Moving

# rename
$ git mv README.md README
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)
    renamed:    README.md -> README
# equivalent to
$ mv README.md README
$ git rm README.md
$ git add README

There is no difference. It just one command instead of three.

Git Basics - Log

Useful commands to look into the change log of a repository

# show full log
$ git log
# show the patch of the last two logs
$ git log -p -2
# statistics
$ git log --stat
# very short
$ git log --pretty=oneline
# who, when and what?
$ git log --pretty=format:"%h - %an, %ar : %s"
# ascii graph (e.g. branches)
$ git log --pretty=format:"%h %s" --graph
# limit by time (--since, --until) use a date: 2025-01-23
$ git log --since=2.weeks
# filter by path (changes to this file)
$ git log -- path/to/file

Git Basics - Forgotten

One of the common undos takes place when you commit too early and possibly forget to add some files, or you mess up your commit message.

$ git commit -m 'Initial commit'
$ git add forgotten_file
# amend to the rescue!
$ git commit --amend

You end up with a single commit — the second commit replaces the results of the first.

Git Basics - Ahhhhhhh

You accidentally added a bunch of files. How to unstage this mess?

# add everything... or maybe not?
$ git add *   or git add .   or git add 
$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)
    renamed:    README.md -> README
    modified:   CONTRIBUTING.md         <<<
# can apply it to staged file too! (unstage)
# git reset HEAD CONTRIBUTING.md
$ git restore --staged CONTRIBUTING.md
...
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   CONTRIBUTING.md

Git Basics - Unmodify

What if you realize that you don’t want to keep your changes to the CONTRIBUTING.md file?

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)
    modified:   CONTRIBUTING.md
# Need to revert it back
$ git checkout -- CONTRIBUTING.md
# or git restore CONTRIBUTING.md
$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)
    renamed:    README.md -> README

Remember, anything that is committed in Git can almost always be recovered. However, anything you lose that was never committed is likely never to be seen again.

Git Basics - Ignore

There might be files, that you do not want git to track and make sure that you do not add them by mistake. There is a secret file .gitignore which lives in your project root.

$ cat .gitignore
*.[pyc]
*~

Standard glob patterns work, and will be applied recursively throughout the entire working tree.

# ignore all .a files
*.a
# but do track lib.a, even though you're ignoring .a files above
!lib.a
# ignore all files in any directory named build
build/
# ignore doc/notes.txt, but not doc/server/arch.txt
doc/*.txt
# ignore all .pdf files in the doc/ directory and any of its subdirectories
doc/**/*.pdf

Git Basics - Recap

We talked about:

  1. Configuration of git (name, email, credentials)
  2. Initializing a git repo
  3. Adding files (new, modified)
  4. Looking at differences, Logs
  5. Changing files (unstage, remove, restore)
  6. Committing files

Git Basics - Outline

What we are going to do in the next few slides:

  1. Remote repos: git remote
  2. Tags git tag (releases)
  3. Branches (Merge) git branch, git merge

Git Basics - Remotes

Because we use distributed VCS such as gitlab.

# I use my SSH Key to access the repo
$ git clone git@gitlab.phaidra.org:imgw/gitlab-tutorial.git
Cloning into 'gitlab-tutorial'...
remote: Enumerating objects: 23, done.
remote: Total 23 (delta 0), reused 0 (delta 0), pack-reused 23 (from 1)
Receiving objects: 100% (23/23), 19.74 KiB | 19.74 MiB/s, done.
Resolving deltas: 100% (5/5), done.
$ cd gitlab-tutorial
$ git remote -v
origin  git@gitlab.phaidra.org:imgw/gitlab-tutorial.git (fetch)
origin  git@gitlab.phaidra.org:imgw/gitlab-tutorial.git (push)
# or maybe have more
$ git remote -v
origin  git@gitlab.phaidra.org:imgw/gitlab-tutorial.git (fetch)
origin  git@gitlab.phaidra.org:imgw/gitlab-tutorial.git (push)
upstream   https://github.com/mblaschek/gitlab-tutorial.git (fetch)
upstream   https://github.com/mblaschek/gitlab-tutorial.git (push)

Git Basics - Remotes

Working with remotes:

# add the above remote on GitHub
$ git remote add upstream https://github.com/mblaschek/gitlab-tutorial.git
# fetch (retrieve) the information on that remote
$ git fetch upstream
# send (push) changes to a remote and a branch, origin master (default)
$ git push <remote> <branch>
# new branches need to set an upstream
$ git push --set-upstream origin <branch>
# inspect a remote (url, branches)
$ git remote show origin

Git Basics - Tags

Git has the ability to tag specific points in a repository’s history as being important. Typically, people use this functionality to mark release points (v1.0, v2.0 and so on)

$ git tag
v1.0
v2.0
$ git tag -a v1.4 -m "my version 1.4"
$ git tag
v1.0
v2.0
v1.4
# tag a specific commit 
$ git tag -a v1.2 9fceb02
# need to share the tag, by pushing it
$ git push origin v1.4
# Can use a tag to checkout
$ git checkout v1.4

Git Branches

A branch is essentially a pointer to a specific commit in the repository’s history.

Git Branches

Branch v1.0 and master point to the same commit.

Git Branches

We add a new branch testing (Notice HEAD)

$ git branch testing

Git Branches

Start working on the testing branch…

$ git checkout testing

Git Branches

$ vim test.f90
$ git commit -a -m 'Make a change'

Git Branches

$ git checkout master
$ vim test.f90
$ git commit -a -m 'Make other changes'

Git Branches - Merge

$ git checkout -b hotfix
Switched to a new branch 'hotfix'
$ vim index.html
$ git commit -a -m 'Fix broken email address'
[hotfix 1fb7853] Fix broken email address
 1 file changed, 2 insertions(+)

Git Branches - Merge

$ git checkout master
$ git merge hotfix
Updating f42c576..3a0874c
Fast-forward
 index.html | 2 ++
 1 file changed, 2 insertions(+)

Git Branches - Merge

$ git checkout iss53
Switched to branch "iss53"
$ vim index.html
$ git commit -a -m 'Finish the new footer [issue 53]'
[iss53 ad82d7a] Finish the new footer [issue 53]
1 file changed, 1 insertion(+)

Git Branches - Merge

$ git checkout master
Switched to branch 'master'
$ git merge iss53
Merge made by the 'recursive' strategy.
index.html |    1 +
1 file changed, 1 insertion(+)

Git Branches - Merge Conflict

$ git merge iss53
Auto-merging index.html
CONFLICT (content): Merge conflict in index.html
Automatic merge failed; fix conflicts and then commit the result.
$ git status
On branch master
You have unmerged paths.
  (fix conflicts and run "git commit")

Unmerged paths:
  (use "git add <file>..." to mark resolution)

    both modified:      index.html

no changes added to commit (use "git add" and/or "git commit -a")

Git Branches - Merge Conflict

The index.html shows this:

<<<<<<< HEAD:index.html
<div id="footer">contact : email.support@github.com</div>
=======
<div id="footer">
 please contact us at support@github.com
</div>
>>>>>>> iss53:index.html

Which indicates what is in your current HEAD and what is coming from the branch. As these different commits edit the same line, there is a merge conflict, that you as a user need to fix.

Git Branches - Merge Conflict

The resolution is to edit the file as you want it, removing the <<<<<<, >>>>>> and ======= parts and e.g.

<div id="footer">
please contact us at email.support@github.com
</div>

To resolve the conflict run git add index.html and git commit to complete the merge.

Git Branches - Recap

We talked about:

  1. Remotes
  2. Tags
  3. Branches
  4. Merging Branches
  5. Merge Conflicts

Topics that we did not cover: git rebase docs

Git - Outline

We will continue with the following topics:

  1. GitLab and PHAIDRA
  2. Workflows
  3. Guidelines
  4. Continuous Integration (CI)

Gitlab

The University Library and the ZID have a long term archive for data, called PHAIDRA (Permanent Hosting, Archiving and Indexing of Digital Resources and Assets)

Part of this service is the GitLab instance: gitlab.phaidra.org as well as the Mattermost chat: discuss.phaidra.org

You need a separate account, but you can keep that account forever (requires: valid mail address)

Gitlab @UNIVIE

Features:

  • available to everybody at the University of Vienna (employees, students, collaborators, externals).
  • University rules (e.g. privacy, data policy, copy rights, …)
  • Recommended platform as compared to github, gitlab.com or any other foreign hosted git instance.
  • no limitations on collaborators or access to the projects (private, internal (group) or public).

Gitlab - Groups/Projects

Gitlab & PHAIDRA

It is possible in GitLab to create releases for your software, if that is part of your publication.

These releases can the be linked or stored in the PHAIDRA catalog. As this is one of many trusted repositories (re3data) you can get a DOI and apply special licenses to your software. Assets in the catalog can be versioned and linked directly to GitLab.

GitLab to GitHub Sync

You can set up some CI/CD yourself, but Gitlab will automatically do this for you:

  1. Go to “Settings > Repository > Mirroring repositories”
  2. Enter your Github repo with your username in front https://<github username>@github.com/path/to/your/repo.git
  3. In the password field, enter your Github token
  4. push is the only option for our GitLab
  5. Press Mirror repository

Git and Tools - VSCode

The best Microsoft product: Visual Studio Code

Useful extensions:

  • Remote Development (WSL, DevC, SSH, Tunnels)​
  • Remote Explorer (.ssh/config)​
  • Python/Jupyter​
  • Prettier - Code Formatter​
  • Shell Syntax​
  • Git graph (git branches/ merges,…)​
  • Fortran​
  • GitHub Copilot (GDPR ✅)

Workflows

It is critical that you start using git on your scientific code. You might not be a developer or even need to foreseeable share your code with someone, but this will help you.

  1. It is a backup
  2. You can evolve your code and look back. (Commit often)
  3. Using git and some basic rules, will make your code more organized.
  4. It sounds silly, but your successful git projects should be on your job applications.

Gitlab & Workflows

How to structure my thoughts and my code?

  • Try not to write a git project that does everything!
  • Start projects for your
    • paper
    • analysis of results
    • specific to a tool that you need to use (e.g. RTTOV, FLEXPART, …)

Thoughts on Master Thesis

There is the motivation, to create a template master thesis project, which can serve as the first step to make sure that information created by students and others can become easier to reuse.

There are some rules:

  1. Develop in Python (if you can)
  2. Write your code with comments (#)
  3. Add Jupyter Notebooks with example plots
  4. Add information to the README.md
  5. add a requirements.txt
  6. Write tests for your code (pytest)

Thoughts on Code Projects and Collaboration

There is a clear benefit if people work together on similar topics to share code. Since you are not code developers, the idea is to write python code that follows a few simple rules:

  1. add comments #
  2. write functions with clear arguments:
def rechunk_lara(directory:str, outputdir:str, chunks:tuple)  -> None:
  1. use automated documentation strings (vscode extension: docstrings)
  2. structure code into separate files: e.g. main.py, reading.py, calculation.py
  3. use types (str, float, int) and defaults (directory:str='/jetfs/...')
  4. use e.g. pigar to create the requirements.txt

Continuous Integration - Gitlab

This can be useful for automated testing or building packages or deployment to websites. There is a clear structure of how things work in gitlab:

  1. add a gitlab-ci configuration, detailing what should be done when
  2. creating container images that include specific software needs

Note

There are some examples on gitlab in the IMGW group. imgw/example-ci

Recap

What we talked about:

  1. git basics
  2. git advanced (branches, merge)
  3. workflows

FIN