Using Git to Manage File History

Overview

Version control systems allow programmers to keep track of their programs. They work by creating backups of source files incrementally. This allows us to do several things:

Track changes to code and see what changes were made when.
Go back to previous versions.
Recover files if they are deleted or corrupted.
Collaborate amongst multiple programmers.
Keep code synchronized between multiple computers.

Git is the most popular version control program.

When you use Git, you create a repository for your project, which contains all of the files as well as their history. As you work, you will commit your work which saves the current state of the project.

Configuration

Before using Git, it wants to know who you are. Run these commands to tell it (replacing the arguments with your actual name and email of course).

ifinlay@cpsc:~$ git config --global user.name "Your Name"
ifinlay@cpsc:~$ git config --global user.email "you@example.com"

It uses this information when working with multiple users.

You also should tell Git which editor you want to use for writing commit messages:

ifinlay@cpsc:~$ git config --global core.editor vim

If you like, you can also tell Git to use vimdiff for displaying differences between files, and for merging multiple copies of files - we will see how to use this later on, but you can set it up now as you're configuring Git:

ifinlay@cpsc:~$ git config --global diff.tool vimdiff
ifinlay@cpsc:~$ git config --global merge.tool vimdiff

Git also by defualt prompts you each time you look at differences which I find annoying. To turn it off, use the following setting:

ifinlay@cpsc:~$ git config --global difftool.prompt false

These commands put their contents in a file in your home directory called ".gitconfig". You can edit this file directly if you wish. Note that this is one of those "hidden files" that start with a '.'

Creating a Repository

Git works by creating a repository to store your code changes. A repository is created in a directory by entering the command:

ifinlay@cpsc:~$ git init

This will create a hidden sub-directory called ".git". This directory contains the repository (or "repo" for short) in which Git will store the current state, and history of the files that comprise your project.

I'd really recommend creating a Git repository for each project that you work on. Even a few hours of work is too much to lose. When starting a new project, first create a directory for it, then go into the directory and create a Git repository:

ifinlay@cpsc:~$ mkdir project1
ifinlay@cpsc:~$ cd project1
ifinlay@cpsc:~$ git init
Initialized empty Git repository in /home/faculty/ifinlay/project1/.git/

Note that a directory can only contain one Git repository. This is one reason to make a directory for each project you work on.

Adding and Committing

Git does not track your changes automatically. You need to tell it which files to keep track of. Suppose we have three files, "input.py", "output.py", and "main.py".

To add these files to the repository, we'll use the git add command:

ifinlay@cpsc:project1$ git add input.py main.py output.py

Generally, everything you create yourself (code, input files, tests etc.) should be added. Compiled files (such as Java .class files) should not be added as they can be easily re-created even if they are lost.

Once we have added files, we can commit changes:

ifinlay@cpsc:project1$ git commit -a

The "-a" flag tells Git to commit all files that have been changed since the last commit. Each time you run git commit it creates a checkpoint of your work.

When you run this command, it will launch Vim for you to write a commit message which should describe what changes were made for your own reference. Write the message at the top of the file, and then save and quit Vim.

If you quit Vim without writing a commit message, Git will abort the commit.

You should perform a git commit every time you want to checkpoint your work.

Git Revisions

Each commit that you make creates a revision of your project. A revision is a state of your project files at a point in time. We can see the revision history with the git log command:

ifinlay@cpsc:project1$ git log
commit 9cf99039f4d1a4de2acda4dc2dea80f0d8389f08
Author: Ian Finlayson <ifinlay@umw.edu>
Date:   Tue Jun 11 13:00:05 2018 -0400

    Add error checking of user input.

Because this project only has one revision so far, that is all that is displayed. The commit string "9cf99039f4d1a4de2acda4dc2dea80f0d8389f08" is a hash which is how Git identifies individual revisions. A hash is just a number which is computed based on data somehow. So Git basically puts all of the files together, adds up the values of the bytes in them, and comes up with a big base-16 number which forms the hash.

The log also shows the author, date of the commit, and the commit message.

If we add some more revisions, those are shown under git log as well:

ifinlay@cpsc:project1$ git log
commit 18325e137dd43af4857c1501cb2df849ec0a73e3
Author: Ian Finlayson <ifinlay@umw.edu>
Date:   Tue Jun 11 13:15:02 2018 -0400

    Fix bug in sorting feature.

commit e18012e8cacc16044f20a9c0b3d7636b91c31cba
Author: Ian Finlayson <ifinlay@umw.edu>
Date:   Tue Jun 11 13:14:28 2018 -0400

    Add sorting feature for output.

commit 9cf99039f4d1a4de2acda4dc2dea80f0d8389f08
Author: Ian Finlayson <ifinlay@umw.edu>
Date:   Tue Jun 11 13:00:05 2018 -0400

    Add error checking of user input.

The oldest revisions are shown at the bottom, and the newest are on top. Note that these are not very good commit messages, but are just put in as an example. A good commit message is more detailed and lets you know what the change accomplishes.

Seeing History

We can ask Git for the differences between any two revisions with git diff:

ifinlay@cpsc:project1$ git diff 18325e137dd43af4857c1501cb2df849ec0a73e3 e18012e8cacc16044f20a9c0b3d7636b91c31cba
diff --git a/input.py b/input.py
index 7756d0a..5d7c164 100644
--- a/input.py
+++ b/input.py
@@ -1,5 +1,4 @@
 
-# get a number and return it
 def get_input():
     return int(input("Enter a number N: "))
 
diff --git a/output.py b/output.py
index 8379835..021ac16 100644
--- a/output.py
+++ b/output.py
@@ -1,5 +1,4 @@
 
-# show the output
 def show_output(value):
     print("The value is:", value)

The output of git diff can be hard to read. We will talk about comparing files with diff tools later on in this course.

Using the full hashes is actually not necessary. We only need to include the first four characters (or more if that would be ambiguous):

ifinlay@cpsc:project1$ git diff 1832 e180

The git status command is also helpful to see what changes have been made since the last commit:

ifinlay@cpsc:project1$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

  modified:   main.py

no changes added to commit (use "git add" and/or "git commit -a")

This tells us that we have changed the main.py file since our last commit. If we are up to date, git status will tell us:

ifinlay@cpsc:project1$ git status 
On branch master
nothing to commit, working directory clean

Recovering Files

If we accidentally delete a file, we can recover the latest committed version of it with git checkout:

ifinlay@cpsc:project1$ rm main.py 
ifinlay@cpsc:project1$ git checkout main.py
ifinlay@cpsc:project1$ ls
input.py  main.py  output.py

Git is worth using even just for this. You will accidentally delete files at some point, and you will be very happy if they are under Git.

Looking at Past Versions

We can also move through our history. If we want to get to a previous revision, we can do so by checking it out.

Below I go all the way back to the initial version with empty files:

ifinlay@cpsc:project1$ cat main.py 
import input
import output

# the simplest three file program ever
value = input.get_input()
output.show_output(value)

ifinlay@cpsc:project1$ git checkout 9cf9
Note: checking out '9cf99039f4d1a4de2acda4dc2dea80f0d8389f08'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at 9cf9903... Just added blank files
ifinlay@cpsc:project1$ cat main.py

All of the files in the project1 directory have now been set back to the state at the first commit. Git gives us some good information in the output. We will not discuss branches in detail, but they allow you to have separate versions of the same code base at once.

We can look at this version and test it. Getting back to the current state is accomplished with:

ifinlay@cpsc:project1$ git checkout master 
Previous HEAD position was 9cf9903... Just added blank files
Switched to branch 'master'
ifinlay@cpsc:project1$ cat main.py  
import input
import output

# the simplest three file program ever
value = input.get_input()
output.show_output(value)

"master" is the main "branch" of our project, so checking it out moves us back to the current state.

Being able to go back and forth through our history is very valuable for working on larger programs. If you notice a bug, and aren't sure where it was introduced, you can go back and check past versions to see where it was introduced.

Another use is for when you removed code, but decide later that you want it. You can go back in time, copy the code someplace, and then paste it into the project's current state.

Undoing Commits

We can also permanently undo commits with git revert:

ifinlay@cpsc:project1$ git revert e18012
[master e21dbb9] Revert "Added comments"
 2 files changed, 2 deletions(-)
ifinlay@cpsc:project1$ git log
commit e21dbb9f16e81cb4c50d636732f3f2a3de51c0a1
Author: Ian Finlayson <ifinlay@umw.edu>
Date:   Tue Jul 11 13:43:30 2018 -0400

    Revert "Add error checking of user input."
    
    This reverts commit e18012e8cacc16044f20a9c0b3d7636b91c31cba.

commit 7bb278f9fe1529db9a16f80beba46209ae9ae462
Author: Ian Finlayson <ifinlay@umw.edu>
Date:   Tue Jul 11 13:27:53 2018 -0400

    Fix bug in sorting feature.

commit 18325e137dd43af4857c1501cb2df849ec0a73e3
Author: Ian Finlayson <ifinlay@umw.edu>
Date:   Tue Jul 11 13:15:02 2018 -0400

    Add sorting feature for output.

commit e18012e8cacc16044f20a9c0b3d7636b91c31cba
Author: Ian Finlayson <ifinlay@umw.edu>
Date:   Tue Jul 11 13:14:28 2018 -0400

    Add error checking of user input.

commit 9cf99039f4d1a4de2acda4dc2dea80f0d8389f08
Author: Ian Finlayson <ifinlay@umw.edu>
Date:   Tue Jul 11 13:00:05 2018 -0400

    Just added blank files

You must commit everything you have before performing a revert. This way if you change your mind about reverting, you can "undo" the revert by reverting the revert commit itself. Git makes it really hard to lose your work accidentally!

You can revert your most recent commit, or prior ones as well. git revert undoes only the specific changes made by the commit you are reverting.

In order for this to work well, it's best to make each commit one discrete thing. For example, if you have a commit which adds a couple of features to the program, fixes a few bugs, and also changes some output messages, then it won't make as much sense to revert that commit. If you want to remove one of the bug fixes from that commit, you will need to do some extra work. However, if each feature/bug fix/change is its own commit, reverting them is easy.

Ignoring Files

Binary files such as compiled executables, or compiled Java .class files should not be put under version control. For one, it does not matter if we lose them because they are generated from files which will be under Git. Also, their changes will show up in git diff making it hard to see important changes.

To tell Git to ignore these files, we can create a file in our project directory called ".gitignore". This file contains a list of file names, or wildcards one per line. Each one is something Git will ignore.

For example, by default, Git will warn us about files not being managed:

ifinlay@cpsc:project1$ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

  Main.class

Here Git is warning us about Main.class being untracked. To fix this, we can add the pattern *.class to .gitignore:

ifinlay@cpsc:project1$ vim .gitignore
ifinlay@cpsc:project1$ cat .gitignore 
*.class
ifinlay@cpsc:project1$ git status 
On branch master
Untracked files:
  (use "git add ..." to include in what will be committed)

  .gitignore

Now Main.class is not listed as being untracked. Git is now ignoring it. Of course, now the ".gitignore" file itself is untracked. We should add it to the repository to fix this:

ifinlay@cpsc:project1$ git add .gitignore
ifinlay@cpsc:project1$ git commit
[master 948c205] Added the .gitignore file to the repository
 1 file changed, 1 insertion(+)
 create mode 100644 .gitignore

Conclusion

Git can help make programming easier by:

Providing a simple way to backup all of your work.
Allowing you to look at different versions of your code.
Allowing you to track your progress.

The basic workflow is:

git init
git add every file you want to track.
git commit -a each time you want to save your progress.

It is good to know that you can see and revert to older versions, but you will not normally need to do it very often.

As we will see, Git can also be used to:

Keep a project in synch between multiple computers.
Work on multiple versions of a program.
Manage updates from multiple programmers at once.

Version control tools are used universally by professional programmers. Taking the time to get used to Git now, will make your life easier, and prepare you well for working as a programmer.