How does Git-Linearize alter commits to make them linear?
I find Extremely Linear Git History (GitHub) unreasonably amusing.
It’s also unreasonably easy to install with HomeBrew (brew install zegl/tap/git-linearize
) - so I just had to give it a try out when initializing a new project.
This led me into a huge detour. Just what was it doing to my Git repo to get the linear hashes?
Come on a journey with me!
A Git-Linearize Journey
jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter> git init
Initialized empty Git repository in /Users/jamie/Development/SwitchControllerAdapter/.git/
jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git commit --allow-empty -m "Initial empty commit"
[main (root-commit) 665bfb3] Initial empty commit
jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git-linearize
error: branch 'extremely-linear' not found.
Switched to a new branch 'extremely-linear'
HEAD is now at 665bfb3 Initial empty commit
[x] 665bfb39d568457d3515a9eaf6b6c735e7756c53 is now 00000000e3cfe735ae4dc8efa511b58016010a98
Switched to branch 'main'
[x] All done, have a good day
Great! But what did git-linearize
actually do to my commit? It says it uses lucky-commit
, which in turn says:
lucky-commit
amends your commit messages by adding a few characters of various types of whitespace, and keeps trying new messages until it finds a good hash. By default, it will look for a commit hash starting with “0000000”.
Let’s take a look at the commit message in the log and see what’s been added!
jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git log
commit 00000000e3cfe735ae4dc8efa511b58016010a98 (HEAD -> main, extremely-linear)
Author: Jamie Montgomerie <jamie@montgomerie.net>
Date: Mon Nov 28 20:57:53 2022 -0800
Initial empty commit
Hmm. I can select that with my mouse, and it doesn’t appear to have any trailing whitespace at-all!
Maybe git show
will tell me more?
jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show HEAD
commit 00000000e3cfe735ae4dc8efa511b58016010a98 (HEAD -> main, extremely-linear)
Author: Jamie Montgomerie <jamie@montgomerie.net>
Date: Mon Nov 28 20:57:53 2022 -0800
Initial empty commit
Okay, this is weird. Let’s check against the original commit to see what the difference is.
jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show 665bfb39d568457d3515a9eaf6b6c735e7756c53
commit 665bfb39d568457d3515a9eaf6b6c735e7756c53
Author: Jamie Montgomerie <jamie@montgomerie.net>
Date: Mon Nov 28 20:57:53 2022 -0800
Initial empty commit
That looks the same! But git-linearize
must’ve done something or the SHAs would be the same.
I think I’d be able to select trailing whitespace if there was any. Maybe it’s modifying the existing inline whitespace, not adding new whitespace? Like, changing spaces to other spacing characters or something? It doesn’t feel like that would be ’enough’ to generate a specific hash…
Let’s check the raw output.
jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show HEAD | hexdump -C
00000000 63 6f 6d 6d 69 74 20 30 30 30 30 30 30 30 30 65 |commit 00000000e|
00000010 33 63 66 65 37 33 35 61 65 34 64 63 38 65 66 61 |3cfe735ae4dc8efa|
00000020 35 31 31 62 35 38 30 31 36 30 31 30 61 39 38 0a |511b58016010a98.|
00000030 41 75 74 68 6f 72 3a 20 4a 61 6d 69 65 20 4d 6f |Author: Jamie Mo|
00000040 6e 74 67 6f 6d 65 72 69 65 20 3c 6a 61 6d 69 65 |ntgomerie <jamie|
00000050 40 6d 6f 6e 74 67 6f 6d 65 72 69 65 2e 6e 65 74 |@montgomerie.net|
00000060 3e 0a 44 61 74 65 3a 20 20 20 4d 6f 6e 20 4e 6f |>.Date: Mon No|
00000070 76 20 32 38 20 32 30 3a 35 37 3a 35 33 20 32 30 |v 28 20:57:53 20|
00000080 32 32 20 2d 30 38 30 30 0a 0a 20 20 20 20 49 6e |22 -0800.. In|
00000090 69 74 69 61 6c 20 65 6d 70 74 79 20 63 6f 6d 6d |itial empty comm|
000000a0 69 74 0a |it.|
000000a3
Don’t see any trailing whitespace or weird characters! How does the hexdump compare with the original commit?
jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show 665bfb39d568457d3515a9eaf6b6c735e7756c53 | hexdump -C
00000000 63 6f 6d 6d 69 74 20 36 36 35 62 66 62 33 39 64 |commit 665bfb39d|
00000010 35 36 38 34 35 37 64 33 35 31 35 61 39 65 61 66 |568457d3515a9eaf|
00000020 36 62 36 63 37 33 35 65 37 37 35 36 63 35 33 0a |6b6c735e7756c53.|
00000030 41 75 74 68 6f 72 3a 20 4a 61 6d 69 65 20 4d 6f |Author: Jamie Mo|
00000040 6e 74 67 6f 6d 65 72 69 65 20 3c 6a 61 6d 69 65 |ntgomerie <jamie|
00000050 40 6d 6f 6e 74 67 6f 6d 65 72 69 65 2e 6e 65 74 |@montgomerie.net|
00000060 3e 0a 44 61 74 65 3a 20 20 20 4d 6f 6e 20 4e 6f |>.Date: Mon No|
00000070 76 20 32 38 20 32 30 3a 35 37 3a 35 33 20 32 30 |v 28 20:57:53 20|
00000080 32 32 20 2d 30 38 30 30 0a 0a 20 20 20 20 49 6e |22 -0800.. In|
00000090 69 74 69 61 6c 20 65 6d 70 74 79 20 63 6f 6d 6d |itial empty comm|
000000a0 69 74 0a |it.|
000000a3
Uh, it’s still exactly the same! Both commits look perfectly normal.
Maybe there are other places in the commit that it’s altering. I read somewhere that the Git system had the capability for having other hidden fields in the commit.
[Much man page reading]
Ah-ha! I can show the raw commit!
jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show --format=raw HEAD
commit 00000000e3cfe735ae4dc8efa511b58016010a98
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
author Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800
committer Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800
Initial empty commit
Uh…
jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show --format=raw 665bfb39d568457d3515a9eaf6b6c735e7756c53
commit 665bfb39d568457d3515a9eaf6b6c735e7756c53
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
author Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800
committer Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800
Initial empty commit
Um… Hexdump again?…
jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show 00000000e3cfe735ae4dc8efa511b58016010a98 --pretty=raw --no-abbrev | hexdump -C
00000000 63 6f 6d 6d 69 74 20 30 30 30 30 30 30 30 30 65 |commit 00000000e|
00000010 33 63 66 65 37 33 35 61 65 34 64 63 38 65 66 61 |3cfe735ae4dc8efa|
00000020 35 31 31 62 35 38 30 31 36 30 31 30 61 39 38 0a |511b58016010a98.|
00000030 74 72 65 65 20 34 62 38 32 35 64 63 36 34 32 63 |tree 4b825dc642c|
00000040 62 36 65 62 39 61 30 36 30 65 35 34 62 66 38 64 |b6eb9a060e54bf8d|
00000050 36 39 32 38 38 66 62 65 65 34 39 30 34 0a 61 75 |69288fbee4904.au|
00000060 74 68 6f 72 20 4a 61 6d 69 65 20 4d 6f 6e 74 67 |thor Jamie Montg|
00000070 6f 6d 65 72 69 65 20 3c 6a 61 6d 69 65 40 6d 6f |omerie <jamie@mo|
00000080 6e 74 67 6f 6d 65 72 69 65 2e 6e 65 74 3e 20 31 |ntgomerie.net> 1|
00000090 36 36 39 36 39 37 38 37 33 20 2d 30 38 30 30 0a |669697873 -0800.|
000000a0 63 6f 6d 6d 69 74 74 65 72 20 4a 61 6d 69 65 20 |committer Jamie |
000000b0 4d 6f 6e 74 67 6f 6d 65 72 69 65 20 3c 6a 61 6d |Montgomerie <jam|
000000c0 69 65 40 6d 6f 6e 74 67 6f 6d 65 72 69 65 2e 6e |ie@montgomerie.n|
000000d0 65 74 3e 20 31 36 36 39 36 39 37 38 37 33 20 2d |et> 1669697873 -|
000000e0 30 38 30 30 0a 0a 20 20 20 20 49 6e 69 74 69 61 |0800.. Initia|
000000f0 6c 20 65 6d 70 74 79 20 63 6f 6d 6d 69 74 0a |l empty commit.|
000000ff
jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> git show 665bfb39d568457d3515a9eaf6b6c735e7756c53 --pretty=raw --no-abbrev | hexdump -C
00000000 63 6f 6d 6d 69 74 20 36 36 35 62 66 62 33 39 64 |commit 665bfb39d|
00000010 35 36 38 34 35 37 64 33 35 31 35 61 39 65 61 66 |568457d3515a9eaf|
00000020 36 62 36 63 37 33 35 65 37 37 35 36 63 35 33 0a |6b6c735e7756c53.|
00000030 74 72 65 65 20 34 62 38 32 35 64 63 36 34 32 63 |tree 4b825dc642c|
00000040 62 36 65 62 39 61 30 36 30 65 35 34 62 66 38 64 |b6eb9a060e54bf8d|
00000050 36 39 32 38 38 66 62 65 65 34 39 30 34 0a 61 75 |69288fbee4904.au|
00000060 74 68 6f 72 20 4a 61 6d 69 65 20 4d 6f 6e 74 67 |thor Jamie Montg|
00000070 6f 6d 65 72 69 65 20 3c 6a 61 6d 69 65 40 6d 6f |omerie <jamie@mo|
00000080 6e 74 67 6f 6d 65 72 69 65 2e 6e 65 74 3e 20 31 |ntgomerie.net> 1|
00000090 36 36 39 36 39 37 38 37 33 20 2d 30 38 30 30 0a |669697873 -0800.|
000000a0 63 6f 6d 6d 69 74 74 65 72 20 4a 61 6d 69 65 20 |committer Jamie |
000000b0 4d 6f 6e 74 67 6f 6d 65 72 69 65 20 3c 6a 61 6d |Montgomerie <jam|
000000c0 69 65 40 6d 6f 6e 74 67 6f 6d 65 72 69 65 2e 6e |ie@montgomerie.n|
000000d0 65 74 3e 20 31 36 36 39 36 39 37 38 37 33 20 2d |et> 1669697873 -|
000000e0 30 38 30 30 0a 0a 20 20 20 20 49 6e 69 74 69 61 |0800.. Initia|
000000f0 6c 20 65 6d 70 74 79 20 63 6f 6d 6d 69 74 0a |l empty commit.|
000000ff
Aargh!
Cue detour into the little-known git-note
- but it turns out that’s not actually included in the SHA (for in-hindsight obvious reasons - you can add notes to existing commits) .
Okay. Maybe when it says “added to the commit message” it doesn’t really mean the commit message, it really means it does it somehow internally? Like it’s added to the commit file in a way that only the checksumming notices it? [Doesn’t that seem a bit fragile?…]
Let’s look at Git’s raw object files on disk!
jamie@Jamies-MacBook-Air ~/D/SwitchControllerAdapter (main)> cd .git
jamie@Jamies-MacBook-Air ~/D/S/.git (GIT_DIR!)> ls
COMMIT_EDITMSG ORIG_HEAD description index logs/ refs/
HEAD config hooks/ info/ objects/
jamie@Jamies-MacBook-Air ~/D/S/.git (GIT_DIR!)> cd objects/
jamie@Jamies-MacBook-Air ~/D/S/.g/objects (GIT_DIR!)> ls
00/ 4b/ 66/ info/ pack/
jamie@Jamies-MacBook-Air ~/D/S/.g/objects (GIT_DIR!)> cd 00/
jamie@Jamies-MacBook-Air ~/D/S/.g/o/00 (GIT_DIR!)> ls
000000e3cfe735ae4dc8efa511b58016010a98
There it is! At last! What’s really in it?
jamie@Jamies-MacBook-Air ~/D/S/.g/o/00 (GIT_DIR!)> cat 000000e3cfe735ae4dc8efa511b58016010a98
x??A
?0E?6??
(1??D?*x?$?j?i??
oom
??????ü 9?Ƞ?^?H???
?['QR?}gZ?ʘ?i+?pO???.E?K???S????f >?ѢmL???4R?0?f?
"NC???@??/(L?E?,???dU?b
Uh, I guess that’s binary. Really I should’ve expected that - I knew that commits were compressed before the SHA was calculated.
jamie@Jamies-MacBook-Air ~/D/S/.git (GIT_DIR!)> gunzip < 000000e3cfe735ae4dc8efa511b58016010a98
gunzip: unknown compression format
Not that easy.
[Research into Git’s compression system and how to uncompress object files]
jamie@Jamies-MacBook-Air ~/D/S/.git (GIT_DIR!)> python3
Python 3.9.6 (default, Sep 26 2022, 11:37:49)
[Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import zlib
>>> compressed_contents = open("objects/00/00000b2305dfff0893b366ff01b49b54b8e167", 'rb').read()
>>> decompressed_contents = zlib.decompress(compressed_contents)
>>> decompressed_contents
b'commit 294\x00tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904\nauthor Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800\ncommitter Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800\n\nInitial empty commit \t \t \t \t\t\t \t \t\t \t\t \n'
There it is! The whitespace is at the end! There’s lots of it!
Let’s compare with the original commit:
>>> compressed_contents = open("objects/66/5bfb39d568457d3515a9eaf6b6c735e7756c53", 'rb').read()
>>> decompressed_contents = zlib.decompress(compressed_contents)
>>> decompressed_contents
b'commit 203\x00tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904\nauthor Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800\ncommitter Jamie Montgomerie <jamie@montgomerie.net> 1669697873 -0800\n\nInitial empty commit\n'
Yup, definitely different. Some more reading tells me that in the Commit XXX
at the start, the XXX
is the byte count of the object file, and the commit message is the last thing in the object file. So the whitespace is indeed in the commit message - the linearized version is 294 bytes vs 203, and, as we saw, there is random whitespace tacked on the end.
Conclusion
I guess the command-line Git tools like git log
and git show
just don’t show trailing whitespace in commit messages, even if you ask for raw output.
Anticlimax?