How does Git-Linearize alter commits to make them linear?

Posted on Nov 30, 2022

I find Extremely Linear Git History (GitHub) unreasonably amusing.

It’s also unreasonably easy to install with HomeBrew (brew install zegl/tap/git-linearize) - so I just had to give it a try out when initializing a new project.

This led me into a huge detour. Just what was it doing to my Git repo to get the linear hashes?

Come on a journey with me!

A Git-Linearize Journey

jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter> git init
Initialized empty Git repository in /Users/jamie/Development/SwitchControllerAdapter/.git/

jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git commit --allow-empty -m "Initial empty commit"
[main (root-commit) 665bfb3] Initial empty commit

jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git-linearize
error: branch 'extremely-linear' not found.
Switched to a new branch 'extremely-linear'
HEAD is now at 665bfb3 Initial empty commit
[x] 665bfb39d568457d3515a9eaf6b6c735e7756c53 is now 00000000e3cfe735ae4dc8efa511b58016010a98
Switched to branch 'main'
[x] All done, have a good day

Great! But what did git-linearize actually do to my commit? It says it uses lucky-commit, which in turn says:

lucky-commit amends your commit messages by adding a few characters of various types of whitespace, and keeps trying new messages until it finds a good hash. By default, it will look for a commit hash starting with “0000000”.

Let’s take a look at the commit message in the log and see what’s been added!

jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git log
commit 00000000e3cfe735ae4dc8efa511b58016010a98 (HEAD -> main, extremely-linear)
Author: Jamie Montgomerie <jamie@montgomerie.net>
Date:   Mon Nov 28 20:57:53 2022 -0800

    Initial empty commit

Hmm. I can select that with my mouse, and it doesn’t appear to have any trailing whitespace at-all!

Maybe git show will tell me more?

jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show HEAD
commit 00000000e3cfe735ae4dc8efa511b58016010a98 (HEAD -> main, extremely-linear)
Author: Jamie Montgomerie <jamie@montgomerie.net>
Date:   Mon Nov 28 20:57:53 2022 -0800

    Initial empty commit

Okay, this is weird. Let’s check against the original commit to see what the difference is.

jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show 665bfb39d568457d3515a9eaf6b6c735e7756c53
commit 665bfb39d568457d3515a9eaf6b6c735e7756c53
Author: Jamie Montgomerie <jamie@montgomerie.net>
Date:   Mon Nov 28 20:57:53 2022 -0800

    Initial empty commit

That looks the same! But git-linearize must’ve done something or the SHAs would be the same.

I think I’d be able to select trailing whitespace if there was any. Maybe it’s modifying the existing inline whitespace, not adding new whitespace? Like, changing spaces to other spacing characters or something? It doesn’t feel like that would be ’enough’ to generate a specific hash…

Let’s check the raw output.

jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show HEAD | hexdump  -C
00000000  63 6f 6d 6d 69 74 20 30  30 30 30 30 30 30 30 65  |commit 00000000e|
00000010  33 63 66 65 37 33 35 61  65 34 64 63 38 65 66 61  |3cfe735ae4dc8efa|
00000020  35 31 31 62 35 38 30 31  36 30 31 30 61 39 38 0a  |511b58016010a98.|
00000030  41 75 74 68 6f 72 3a 20  4a 61 6d 69 65 20 4d 6f  |Author: Jamie Mo|
00000040  6e 74 67 6f 6d 65 72 69  65 20 3c 6a 61 6d 69 65  |ntgomerie <jamie|
00000050  40 6d 6f 6e 74 67 6f 6d  65 72 69 65 2e 6e 65 74  |@montgomerie.net|
00000060  3e 0a 44 61 74 65 3a 20  20 20 4d 6f 6e 20 4e 6f  |>.Date:   Mon No|
00000070  76 20 32 38 20 32 30 3a  35 37 3a 35 33 20 32 30  |v 28 20:57:53 20|
00000080  32 32 20 2d 30 38 30 30  0a 0a 20 20 20 20 49 6e  |22 -0800..    In|
00000090  69 74 69 61 6c 20 65 6d  70 74 79 20 63 6f 6d 6d  |itial empty comm|
000000a0  69 74 0a                                          |it.|
000000a3

Don’t see any trailing whitespace or weird characters! How does the hexdump compare with the original commit?

jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show 665bfb39d568457d3515a9eaf6b6c735e7756c53 | hexdump  -C
00000000  63 6f 6d 6d 69 74 20 36  36 35 62 66 62 33 39 64  |commit 665bfb39d|
00000010  35 36 38 34 35 37 64 33  35 31 35 61 39 65 61 66  |568457d3515a9eaf|
00000020  36 62 36 63 37 33 35 65  37 37 35 36 63 35 33 0a  |6b6c735e7756c53.|
00000030  41 75 74 68 6f 72 3a 20  4a 61 6d 69 65 20 4d 6f  |Author: Jamie Mo|
00000040  6e 74 67 6f 6d 65 72 69  65 20 3c 6a 61 6d 69 65  |ntgomerie <jamie|
00000050  40 6d 6f 6e 74 67 6f 6d  65 72 69 65 2e 6e 65 74  |@montgomerie.net|
00000060  3e 0a 44 61 74 65 3a 20  20 20 4d 6f 6e 20 4e 6f  |>.Date:   Mon No|
00000070  76 20 32 38 20 32 30 3a  35 37 3a 35 33 20 32 30  |v 28 20:57:53 20|
00000080  32 32 20 2d 30 38 30 30  0a 0a 20 20 20 20 49 6e  |22 -0800..    In|
00000090  69 74 69 61 6c 20 65 6d  70 74 79 20 63 6f 6d 6d  |itial empty comm|
000000a0  69 74 0a                                          |it.|
000000a3

Uh, it’s still exactly the same! Both commits look perfectly normal.

Maybe there are other places in the commit that it’s altering. I read somewhere that the Git system had the capability for having other hidden fields in the commit.

[Much man page reading]

Ah-ha! I can show the raw commit!

jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show --format=raw  HEAD
commit 00000000e3cfe735ae4dc8efa511b58016010a98
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
author Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800
committer Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800

    Initial empty commit

Uh…

jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show --format=raw  665bfb39d568457d3515a9eaf6b6c735e7756c53
commit 665bfb39d568457d3515a9eaf6b6c735e7756c53
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
author Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800
committer Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800

    Initial empty commit

Um… Hexdump again?…

jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show 00000000e3cfe735ae4dc8efa511b58016010a98 --pretty=raw --no-abbrev | hexdump -C

00000000  63 6f 6d 6d 69 74 20 30  30 30 30 30 30 30 30 65  |commit 00000000e|
00000010  33 63 66 65 37 33 35 61  65 34 64 63 38 65 66 61  |3cfe735ae4dc8efa|
00000020  35 31 31 62 35 38 30 31  36 30 31 30 61 39 38 0a  |511b58016010a98.|
00000030  74 72 65 65 20 34 62 38  32 35 64 63 36 34 32 63  |tree 4b825dc642c|
00000040  62 36 65 62 39 61 30 36  30 65 35 34 62 66 38 64  |b6eb9a060e54bf8d|
00000050  36 39 32 38 38 66 62 65  65 34 39 30 34 0a 61 75  |69288fbee4904.au|
00000060  74 68 6f 72 20 4a 61 6d  69 65 20 4d 6f 6e 74 67  |thor Jamie Montg|
00000070  6f 6d 65 72 69 65 20 3c  6a 61 6d 69 65 40 6d 6f  |omerie <jamie@mo|
00000080  6e 74 67 6f 6d 65 72 69  65 2e 6e 65 74 3e 20 31  |ntgomerie.net> 1|
00000090  36 36 39 36 39 37 38 37  33 20 2d 30 38 30 30 0a  |669697873 -0800.|
000000a0  63 6f 6d 6d 69 74 74 65  72 20 4a 61 6d 69 65 20  |committer Jamie |
000000b0  4d 6f 6e 74 67 6f 6d 65  72 69 65 20 3c 6a 61 6d  |Montgomerie <jam|
000000c0  69 65 40 6d 6f 6e 74 67  6f 6d 65 72 69 65 2e 6e  |ie@montgomerie.n|
000000d0  65 74 3e 20 31 36 36 39  36 39 37 38 37 33 20 2d  |et> 1669697873 -|
000000e0  30 38 30 30 0a 0a 20 20  20 20 49 6e 69 74 69 61  |0800..    Initia|
000000f0  6c 20 65 6d 70 74 79 20  63 6f 6d 6d 69 74 0a     |l empty commit.|
000000ff

jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show 665bfb39d568457d3515a9eaf6b6c735e7756c53 --pretty=raw --no-abbrev | hexdump -C

00000000  63 6f 6d 6d 69 74 20 36  36 35 62 66 62 33 39 64  |commit 665bfb39d|
00000010  35 36 38 34 35 37 64 33  35 31 35 61 39 65 61 66  |568457d3515a9eaf|
00000020  36 62 36 63 37 33 35 65  37 37 35 36 63 35 33 0a  |6b6c735e7756c53.|
00000030  74 72 65 65 20 34 62 38  32 35 64 63 36 34 32 63  |tree 4b825dc642c|
00000040  62 36 65 62 39 61 30 36  30 65 35 34 62 66 38 64  |b6eb9a060e54bf8d|
00000050  36 39 32 38 38 66 62 65  65 34 39 30 34 0a 61 75  |69288fbee4904.au|
00000060  74 68 6f 72 20 4a 61 6d  69 65 20 4d 6f 6e 74 67  |thor Jamie Montg|
00000070  6f 6d 65 72 69 65 20 3c  6a 61 6d 69 65 40 6d 6f  |omerie <jamie@mo|
00000080  6e 74 67 6f 6d 65 72 69  65 2e 6e 65 74 3e 20 31  |ntgomerie.net> 1|
00000090  36 36 39 36 39 37 38 37  33 20 2d 30 38 30 30 0a  |669697873 -0800.|
000000a0  63 6f 6d 6d 69 74 74 65  72 20 4a 61 6d 69 65 20  |committer Jamie |
000000b0  4d 6f 6e 74 67 6f 6d 65  72 69 65 20 3c 6a 61 6d  |Montgomerie <jam|
000000c0  69 65 40 6d 6f 6e 74 67  6f 6d 65 72 69 65 2e 6e  |ie@montgomerie.n|
000000d0  65 74 3e 20 31 36 36 39  36 39 37 38 37 33 20 2d  |et> 1669697873 -|
000000e0  30 38 30 30 0a 0a 20 20  20 20 49 6e 69 74 69 61  |0800..    Initia|
000000f0  6c 20 65 6d 70 74 79 20  63 6f 6d 6d 69 74 0a     |l empty commit.|
000000ff

Aargh!

Cue detour into the little-known git-note - but it turns out that’s not actually included in the SHA (for in-hindsight obvious reasons - you can add notes to existing commits) .

Okay. Maybe when it says “added to the commit message” it doesn’t really mean the commit message, it really means it does it somehow internally? Like it’s added to the commit file in a way that only the checksumming notices it? [Doesn’t that seem a bit fragile?…]

Let’s look at Git’s raw object files on disk!

jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> cd .git

jamie@Jamies-MacBook-Air ~/D/S/.git (GIT_DIR!)> ls
COMMIT_EDITMSG  ORIG_HEAD       description     index           logs/           refs/
HEAD            config          hooks/          info/           objects/

jamie@Jamies-MacBook-Air ~/D/S/.git (GIT_DIR!)> cd objects/

jamie@Jamies-MacBook-Air ~/D/S/.g/objects (GIT_DIR!)> ls
00/   4b/   66/   info/ pack/

jamie@Jamies-MacBook-Air ~/D/S/.g/objects (GIT_DIR!)> cd 00/

jamie@Jamies-MacBook-Air ~/D/S/.g/o/00 (GIT_DIR!)> ls
000000e3cfe735ae4dc8efa511b58016010a98

There it is! At last! What’s really in it?

jamie@Jamies-MacBook-Air ~/D/S/.g/o/00 (GIT_DIR!)> cat 000000e3cfe735ae4dc8efa511b58016010a98 
x??A
?0E?6??
       (1??D?*x?$?j?i??
                       oom
??????ü	9?Ƞ?^?H???
?['QR?}gZ?ʘ?i+?pO???.E?K???S???꘾?f >?ѢmL???4R?0?f?
                                                  "NC???@??/(L?E?,???dU?b

Uh, I guess that’s binary. Really I should’ve expected that - I knew that commits were compressed before the SHA was calculated.

jamie@Jamies-MacBook-Air ~/D/S/.git (GIT_DIR!)> gunzip < 000000e3cfe735ae4dc8efa511b58016010a98 
gunzip: unknown compression format

Not that easy.

[Research into Git’s compression system and how to uncompress object files]

jamie@Jamies-MacBook-Air ~/D/S/.git (GIT_DIR!)> python3
Python 3.9.6 (default, Sep 26 2022, 11:37:49) 
[Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import zlib

>>> compressed_contents = open("objects/00/00000b2305dfff0893b366ff01b49b54b8e167", 'rb').read()

>>> decompressed_contents = zlib.decompress(compressed_contents)

>>> decompressed_contents
b'commit 294\x00tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904\nauthor Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800\ncommitter Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800\n\nInitial empty commit                                              \t    \t   \t \t\t\t \t \t\t        \t\t                \n'

There it is! The whitespace is at the end! There’s lots of it!

Let’s compare with the original commit:

>>> compressed_contents = open("objects/66/5bfb39d568457d3515a9eaf6b6c735e7756c53", 'rb').read()

>>> decompressed_contents = zlib.decompress(compressed_contents)

>>> decompressed_contents
b'commit 203\x00tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904\nauthor Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800\ncommitter Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800\n\nInitial empty commit\n'

Yup, definitely different. Some more reading tells me that in the Commit XXX at the start, the XXX is the byte count of the object file, and the commit message is the last thing in the object file. So the whitespace is indeed in the commit message - the linearized version is 294 bytes vs 203, and, as we saw, there is random whitespace tacked on the end.

Conclusion

I guess the command-line Git tools like git log and git show just don’t show trailing whitespace in commit messages, even if you ask for raw output.

Anticlimax?