Challenge Summary

A committee was formed last year to decide the highly-sensitive contents of the flag for this challenge. Our informant managed to leak some data, but he was arrested within weeks of the committee's operation. All we have are the logs of the committee's meetings.

Note: file fixed, please redownload

Author: Aurel300

We are given log.txt that contains the full commit history of the flag-containing repository.

...

commit dca4ca5150b82e541e2f5c42d00493ba8d4aa84a
Author: Christopher L. Hatch <crisscross.the.hatch@legal.committee>
Date:   Mon Mar 23 12:30:00 2020 +0000

    Proceedings of the flag-deciding committee: 8, 31, 36

commit c3e6c8ea777d50595a8b288cbbbd7a675c43b5df
Author: Pamela W. Mathews <pammy.emm@legal.committee>
Date:   Fri Mar 13 12:30:00 2020 +0000

    Proceedings of the flag-deciding committee: 18

commit 08e1f0dd3b9d710b1eea81f6b8f76c455f634e87
Author: Robert J. Lawful <boblaw@legal.committee>
Date:   Wed Mar 4 12:00:00 2020 +0000

    Initial formation of the flag-deciding committee.

We are also given the content of the only file, namely flag.txt, for the first three commits (08e1f0d, c3e6c8e and dca4ca5). They are respectively:

union{**********************************************}
union{*****************_****************************}
union{*******3*********_************r****d**********}

Also, three characters are filled in each subsequent commit. The objective is to recover the flag.

Solution

There123 are useful pointers those available online, and it greatly helped me when I was working on the challenge.

📖 Why documenting something that already exists? It is very exciting to learn how it works. Coincidentally I am working on node trees at work. Since I have never look at how node trees for git works, this is the perfect chance.

Moreover, I'll be working on examples instead of laying out the structure again. The objective of this blog post is to compute the hashes for the first three commits correctly. Without loss of generality, the remaining commits work in the same way.

Part I: Computing the tree hash

To begin, let's look at an example which is irrelevant to the challenge. Each tree (folder) node is shown in green, while each blob (file) node is shown in yellow.

digraph {
  rankdir=BT

  "/-1"[shape=box, style="dashed,filled", fillcolor="#80ff8080", label=<root<BR /><font point-size="10">Version 1<BR />Hash 5b7539b</font>>]
  "/README.md-1"[shape=box, style="dashed,filled", fillcolor="#ffff8080", label=<README.md<BR /><font point-size="10">Version 1<BR />Hash 95d09f2</font>>]

  "/README.md-1"->"/-1"[style=dashed]


  "/-2"[shape=box, style="dashed,filled", fillcolor="#80ff8080", label=<root<BR /><font point-size="10">Version 2<BR />Hash 794567b</font>>]
  "/README.md-2"[shape=box, style="filled", fillcolor="#ffff8080", label=<README.md<BR /><font point-size="10">Version 2<BR />Hash bc7774a</font>>]

  "/README.md-2"->"/-2"[style=dashed]


  "/-3"[shape=box, style="dashed,filled", fillcolor="#80ff8080", label=<root<BR /><font point-size="10">Version 3<BR />Hash 0347b68</font>>]
  "/src-3"[shape=box, style="dashed,filled", fillcolor="#80ff8080", label=<src<BR /><font point-size="10">Version 3<BR />Hash bf27a6d</font>>]
  "/src/index.py-3"[shape=box, style="filled", fillcolor="#ffff8080", label=<index.py<BR /><font point-size="10">Version 3<BR />Hash 5d3086d</font>>]

  "/README.md-2"->"/-3"[style=dashed]
  "/src-3"->"/-3"[style=dashed]
  "/src/index.py-3"->"/src-3"[style=dashed]


  "/-4"[shape=box, style="filled", fillcolor="#80ff8080", label=<root<BR /><font point-size="10">Version 4<BR />Hash b14d3a6</font>>]
  "/src-4"[shape=box, style="filled", fillcolor="#80ff8080", label=<src<BR /><font point-size="10">Version 4<BR />Hash f152f55</font>>]
  "/src/README.md-4"[shape=box, style="filled", fillcolor="#ffff8080", label=<README.md<BR /><font point-size="10">Version 4<BR />Hash cd12bea</font>>]

  "/README.md-2"->"/-4"
  "/src-4"->"/-4"
  "/src/README.md-4"->"/src-4"
  "/src/index.py-3"->"/src-4"

  { rank = same; "/-1"; "/-2"; "/-3"; "/-4"; }

  "/-1"->"/-2"[color="#00000000"]
  "/-2"->"/-3"[color="#00000000"]
  "/-3"->"/-4"[color="#00000000"]
}

There are four commits from the above example. In the latest version, there are three files, namely, /README.md, /src/README.md and /src/index.py. Also, there are two folders: / and /src. Let's first look at how the hashes for file nodes are computed. Suppose that /README.md is an one-liner saying hello world!. Then the file hash starts with bc7774a - this is the SHA-1 digest of the file content, prefixed with blob [LENGTH]\x00. Let's look at the following code snippet on how a file hash is computed:

file_hash = hashlib.sha1(b'blob 12\x00hello world!').hexdigest()
# bc7774a7b18deb1d7bd0212d34246a9b1260ae17

That is everything for computing file hashes. Before we are computing tree hashes, let's assume that the file hashes for /src/index.py and /src/README.md are:

/src/index.py  5d3086defce3c9327c894ed8c005e15d7f75c906
/src/README.md cd12bea6b052e9b4c1efd973ae4708b57f0e1c36

For git, the hash of a tree depends on its child nodes. Let's adopt a bottom-up approach to compute the hash of the root node (i.e., the entire tree). In this way, we will first retrieve the hash of /src, then the hash for /.

To compute tree hashes, we will be first sorting the child nodes in lexicographic order. We will be adding a six-digit number indicating the mode4 of the nodes. The first three digits are usually 100 or 040, which represent a regular file and a directory respectively. The next three digits would indicate the Unix permissions. For example, what we need to compute the hash for /src are:

100644 README.md cd12bea6b052e9b4c1efd973ae4708b57f0e1c36
100644 index.py  5d3086defce3c9327c894ed8c005e15d7f75c906

Referencing from various sources online5, I have come up with the code snippet to compute tree hash for /src (and /).

# Tree hash for /src
content  = b'100644 README.md\x00' + bytes.fromhex('cd12bea6b052e9b4c1efd973ae4708b57f0e1c36')
content += b'100644 index.py\x00' + bytes.fromhex('5d3086defce3c9327c894ed8c005e15d7f75c906')

content  = f'tree {len(content)}\x00'.encode() + content
tree_hash = hashlib.sha1(content).hexdigest()
# f152f55e77dd7eea0ea1fca25a59b9b2793ef2c2
# Tree hash for /
content  = b'100644 README.md\x00' + bytes.fromhex('bc7774a7b18deb1d7bd0212d34246a9b1260ae17')
content += b'40000 src\x00' + bytes.fromhex('f152f55e77dd7eea0ea1fca25a59b9b2793ef2c2')

content  = f'tree {len(content)}\x00'.encode() + content
tree_hash = hashlib.sha1(content).hexdigest()
# b14d3a6e7ec4a95ba6f9bdc01af86dd85ab537eb

It is possible to verify via git cat-file whether the hash is correct. It would print its preimage if the hash checks out. After all, the tree hash for version four would be b14d3a6e7ec4a95ba6f9bdc01af86dd85ab537eb, which is defined by the tree hash of the root directory.

git cat-file tree b14d3a6e7ec4a95ba6f9bdc01af86dd85ab537eb | hd

Back to the challenge. Since it is the only file in the repository, it is straight forward to compute the tree hash if we know its content. These are the node trees in the first versions and how the hashes are computed.

digraph {
  rankdir=BT

  "root-1"[shape=box, style="dashed,filled", fillcolor="#80ff8080", label=<root<BR /><font point-size="10">Version 1<BR />Hash 661c9eb</font>>]
  "root-2"[shape=box, style="dashed,filled", fillcolor="#80ff8080", label=<root<BR /><font point-size="10">Version 2<BR />Hash 754603f</font>>]
  "root-3"[shape=box, style="filled", fillcolor="#80ff8080", label=<root<BR /><font point-size="10">Version 3<BR />Hash efc6902</font>>]
  "flag-1"[shape=box, style="dashed,filled", fillcolor="#ffff8080", label=<flag.txt<BR /><font point-size="10">Version 1<BR />Hash 3f51471</font>>]
  "flag-2"[shape=box, style="dashed,filled", fillcolor="#ffff8080", label=<flag.txt<BR /><font point-size="10">Version 2<BR />Hash f0a9b30</font>>]
  "flag-3"[shape=box, style="filled", fillcolor="#ffff8080",label=<flag.txt<BR /><font point-size="10">Version 3<BR />Hash b3f1331</font>>]

  "flag-1"->"root-1"[style=dashed]
  "flag-2"->"root-2"[style=dashed]
  "flag-3"->"root-3"
}
# Version 1

# blob hash for "/flag.txt"
content = b'union{**********************************************}\n'
content = f'blob {len(content)}\x00'.encode() + content
file_hash = hashlib.sha1(content).hexdigest()
# 3f51471860e4e0364a2e7a79c4bc14267add8d39

# tree hash for "/"
content = b'100644 flag.txt\x00' + bytes.fromhex(file_hash)
content = f'tree {len(content)}\x00'.encode() + content
tree_hash = hashlib.sha1(content).hexdigest()
# 661c9ebfb47590de9c094b3c56f7700050731b36
# Version 2

# blob hash for "/flag.txt"
content = b'union{*****************_****************************}\n'
content = f'blob {len(content)}\x00'.encode() + content
file_hash = hashlib.sha1(content).hexdigest()
# f0a9b306519600289bb5d7ed5984babae61cf3b8

# tree hash for "/"
content = b'100644 flag.txt\x00' + bytes.fromhex(file_hash)
content = f'tree {len(content)}\x00'.encode() + content
tree_hash = hashlib.sha1(content).hexdigest()
# 754603fb1051aecc022315196611374f65791906
# Version 3

# blob hash for "/flag.txt"
content = b'union{*******3*********_************r****d**********}\n'
content = f'blob {len(content)}\x00'.encode() + content
file_hash = hashlib.sha1(content).hexdigest()
# b3f1331849f82c64f3d164d65891ab44e3662858

# tree hash for "/"
content = b'100644 flag.txt\x00' + bytes.fromhex(file_hash)
content = f'tree {len(content)}\x00'.encode() + content
tree_hash = hashlib.sha1(content).hexdigest()
# efc6902a4e82e09282b6a3b71b61bf8cccfe6689

Part II: Finding the commit hash

The commit hashes are relatively easier to compute. Although the commits generally form a tree, the commits in the challenge form a chain. This is because there are no merge commits within, where there would be multiple parents for a commit.

digraph {
  rankdir=BT

  "commit-1"[style="dashed,filled", fillcolor="#8080ff80", label=<Version 1<BR /><font point-size="10">Hash 08e1f0d</font>>]
  "commit-2"[style="dashed,filled", fillcolor="#8080ff80", label=<Version 2<BR /><font point-size="10">Hash c3e6c8e</font>>]
  "commit-3"[style="filled", fillcolor="#8080ff80", label=<Version 3<BR /><font point-size="10">Hash dca4ca5</font>>]

  "root-1"[shape=box, style="dashed,filled", fillcolor="#80ff8080", label=<root<BR /><font point-size="10">Version 1<BR />Hash 661c9eb</font>>]
  "root-2"[shape=box, style="dashed,filled", fillcolor="#80ff8080", label=<root<BR /><font point-size="10">Version 2<BR />Hash 754603f</font>>]
  "root-3"[shape=box, style="filled", fillcolor="#80ff8080", label=<root<BR /><font point-size="10">Version 3<BR />Hash efc6902</font>>]
  "flag-1"[shape=box, style="dashed,filled", fillcolor="#ffff8080", label=<flag.txt<BR /><font point-size="10">Version 1<BR />Hash 3f51471</font>>]
  "flag-2"[shape=box, style="dashed,filled", fillcolor="#ffff8080", label=<flag.txt<BR /><font point-size="10">Version 2<BR />Hash f0a9b30</font>>]
  "flag-3"[shape=box, style="filled", fillcolor="#ffff8080",label=<flag.txt<BR /><font point-size="10">Version 3<BR />Hash b3f1331</font>>]

  "flag-1"->"root-1"[style=dashed]
  "flag-2"->"root-2"[style=dashed]
  "flag-3"->"root-3"

  { rank = same; "commit-1"; "commit-2"; "commit-3"; }
  "commit-1"->"commit-2"
  "commit-2"->"commit-3"

  "root-1"->"commit-1"
  "root-2"->"commit-2"
  "root-3"->"commit-3"
}

There are five fields to be filled to compute the commit hash: tree hash, parent hashes (if exists), author, committer and the commit message. With further ado, let's see how the commit hashes are calculated.

# Version 1

content  = b'tree 661c9ebfb47590de9c094b3c56f7700050731b36\n'
content += b'author Robert J. Lawful <boblaw@legal.committee> 1583323200 +0000\n'
content += b'committer Flag-deciding Committee <committee@legal.committee> 1583323200 +0000\n'
content += b'\n'
content += b'Initial formation of the flag-deciding committee.\n' # The commit message

content = f'commit {len(content)}\x00'.encode() + content
commit_hash = hashlib.sha1(content).hexdigest()
# 08e1f0dd3b9d710b1eea81f6b8f76c455f634e87
# Version 2

content  = b'tree 754603fb1051aecc022315196611374f65791906\n'
content += b'parent 08e1f0dd3b9d710b1eea81f6b8f76c455f634e87\n'
content += b'author Pamela W. Mathews <pammy.emm@legal.committee> 1584102600 +0000\n'
content += b'committer Flag-deciding Committee <committee@legal.committee> 1584102600 +0000\n'
content += b'\n'
content += b'Proceedings of the flag-deciding committee: 18\n' # The commit message

content = f'commit {len(content)}\x00'.encode() + content
commit_hash = hashlib.sha1(content).hexdigest()
# c3e6c8ea777d50595a8b288cbbbd7a675c43b5df
# Version 3

content  = b'tree efc6902a4e82e09282b6a3b71b61bf8cccfe6689\n'
content += b'parent c3e6c8ea777d50595a8b288cbbbd7a675c43b5df\n'
content += b'author Christopher L. Hatch <crisscross.the.hatch@legal.committee> 1584966600 +0000\n'
content += b'committer Flag-deciding Committee <committee@legal.committee> 1584966600 +0000\n'
content += b'\n'
content += b'Proceedings of the flag-deciding committee: 8, 31, 36\n' # The commit message

content = f'commit {len(content)}\x00'.encode() + content
commit_hash = hashlib.sha1(content).hexdigest()
# dca4ca5150b82e541e2f5c42d00493ba8d4aa84a

Part III: Back to the challenge

From the piece above, it is pretty evident to solve the challenge. What we need to do is to guess the committed bytes (the indices are given in the commit message). We are able to compute the commit hash and check if it is correct. This is my final solve script. Running the program for a minute or so, we have the flag 🎉!

union{c0mm1tt33_d3c1deD_bu7_SHA_d3t3rm1n3d_6a7c2619a}