2016-10-21

Simple Experiment on Git Scalability

Have you ever experienced big git repositories growing slow? I was curious to learn more about the scalability of git and ran the following simple experiment:

In a script I made 300 commits and in every commit I added one file with random data to the repository. Moreover, I randomly changed one of the existing files. Then I checked out the old commits in a non-consecutive order. The time for every git operation (git status, git add, git checkout) was measured and is visualized in the following plots. The time required for git commit is negligible in comparison to the other operations and it is therefore omitted in the plots. The size of the file that was added to the repository in every commit was varied in different runs of the experiment (80 chars x 100 lines, 80 chars x 1000 lines, 80 chars x 10,000 lines). The times presented for git checkout are averaged over the times required for the checkouts of all the individual commits.

1. File with 80 chars x 100 lines added to the repository in every commit.

2. File with 80 chars x 1000 lines added to the repository in every commit.

3. File with 80 chars x 10,000 lines added to the repository in every commit.
The experiment was run in a ramdisk in order to decrease the time required for the overall experiment. I assume that the measured times presented here are higher by a constant factor (~10) for git use with a hard disk.

In the first run of the experiment with 'small' files (80 chars x 100 lines) the operations git status, git add and git checkout seem to scale very linear with the number of commits.

In the second run of the experiment with 'medium' file sizes (80 chars x 1000 lines) more noise on the times required for git add and git status is observed, which make it a bit hard to identify an overall trend. Interestingly, the time required for git add and git status seem to  scale constantly for 'large' file sizes (80 chars x 10,000 lines). The average time required for git checkout seems to scale very linearly in all three runs of the experiment with the number of commits in the repository.

In the second (third) run of the experiment the file size is increased by a factor of 10 (100). As can be observed from the plots, this increases the general times measured for git checkout, but only by a factor of  ~3 (~6).

The Python scripts that were used for the experiment can be downloaded here. Git version 2.7.4 was used for the experiment.