Union CTF 2021 (II): committee
Challenge Summary⌗
A committee was formed last year to decide the highly-sensitive contents of the flag for this challenge. Our informant managed to leak some data, but he was arrested within weeks of the committee's operation. All we have are the logs of the committee's meetings.
Note: file fixed, please redownload
Author: Aurel300
We are given log.txt
that contains the full commit history of the flag-containing repository.
...
commit dca4ca5150b82e541e2f5c42d00493ba8d4aa84a
Author: Christopher L. Hatch <crisscross.the.hatch@legal.committee>
Date: Mon Mar 23 12:30:00 2020 +0000
Proceedings of the flag-deciding committee: 8, 31, 36
commit c3e6c8ea777d50595a8b288cbbbd7a675c43b5df
Author: Pamela W. Mathews <pammy.emm@legal.committee>
Date: Fri Mar 13 12:30:00 2020 +0000
Proceedings of the flag-deciding committee: 18
commit 08e1f0dd3b9d710b1eea81f6b8f76c455f634e87
Author: Robert J. Lawful <boblaw@legal.committee>
Date: Wed Mar 4 12:00:00 2020 +0000
Initial formation of the flag-deciding committee.
We are also given the content of the only file, namely flag.txt
, for the first three commits (08e1f0d
, c3e6c8e
and dca4ca5
). They are respectively:
union{**********************************************}
union{*****************_****************************}
union{*******3*********_************r****d**********}
Also, three characters are filled in each subsequent commit. The objective is to recover the flag.
Solution⌗
There123 are useful pointers those available online, and it greatly helped me when I was working on the challenge.
Moreover, I'll be working on examples instead of laying out the structure again. The objective of this blog post is to compute the hashes for the first three commits correctly. Without loss of generality, the remaining commits work in the same way.
Part I: Computing the tree hash⌗
To begin, let's look at an example which is irrelevant to the challenge. Each tree (folder) node is shown in green, while each blob (file) node is shown in yellow.
There are four commits from the above example. In the latest version, there are three files, namely, /README.md
, /src/README.md
and /src/index.py
. Also, there are two folders: /
and /src
. Let's first look at how the hashes for file nodes are computed. Suppose that /README.md
is an one-liner saying hello world!
. Then the file hash starts with bc7774a
- this is the SHA-1 digest of the file content, prefixed with blob [LENGTH]\x00
. Let's look at the following code snippet on how a file hash is computed:
file_hash = hashlib.sha1(b'blob 12\x00hello world!').hexdigest()
# bc7774a7b18deb1d7bd0212d34246a9b1260ae17
That is everything for computing file hashes. Before we are computing tree hashes, let's assume that the file hashes for /src/index.py
and /src/README.md
are:
/src/index.py 5d3086defce3c9327c894ed8c005e15d7f75c906
/src/README.md cd12bea6b052e9b4c1efd973ae4708b57f0e1c36
For git, the hash of a tree depends on its child nodes. Let's adopt a bottom-up approach to compute the hash of the root node (i.e., the entire tree). In this way, we will first retrieve the hash of /src
, then the hash for /
.
To compute tree hashes, we will be first sorting the child nodes in lexicographic order. We will be adding a six-digit number indicating the mode4 of the nodes. The first three digits are usually 100
or 040
, which represent a regular file and a directory respectively. The next three digits would indicate the Unix permissions. For example, what we need to compute the hash for /src
are:
100644 README.md cd12bea6b052e9b4c1efd973ae4708b57f0e1c36
100644 index.py 5d3086defce3c9327c894ed8c005e15d7f75c906
Referencing from various sources online5, I have come up with the code snippet to compute tree hash for /src
(and /
).
# Tree hash for /src
content = b'100644 README.md\x00' + bytes.fromhex('cd12bea6b052e9b4c1efd973ae4708b57f0e1c36')
content += b'100644 index.py\x00' + bytes.fromhex('5d3086defce3c9327c894ed8c005e15d7f75c906')
content = f'tree {len(content)}\x00'.encode() + content
tree_hash = hashlib.sha1(content).hexdigest()
# f152f55e77dd7eea0ea1fca25a59b9b2793ef2c2
# Tree hash for /
content = b'100644 README.md\x00' + bytes.fromhex('bc7774a7b18deb1d7bd0212d34246a9b1260ae17')
content += b'40000 src\x00' + bytes.fromhex('f152f55e77dd7eea0ea1fca25a59b9b2793ef2c2')
content = f'tree {len(content)}\x00'.encode() + content
tree_hash = hashlib.sha1(content).hexdigest()
# b14d3a6e7ec4a95ba6f9bdc01af86dd85ab537eb
It is possible to verify via git cat-file
whether the hash is correct. It would print its preimage if the hash checks out. After all, the tree hash for version four would be b14d3a6e7ec4a95ba6f9bdc01af86dd85ab537eb
, which is defined by the tree hash of the root directory.
git cat-file tree b14d3a6e7ec4a95ba6f9bdc01af86dd85ab537eb | hd
Back to the challenge. Since it is the only file in the repository, it is straight forward to compute the tree hash if we know its content. These are the node trees in the first versions and how the hashes are computed.
# Version 1
# blob hash for "/flag.txt"
content = b'union{**********************************************}\n'
content = f'blob {len(content)}\x00'.encode() + content
file_hash = hashlib.sha1(content).hexdigest()
# 3f51471860e4e0364a2e7a79c4bc14267add8d39
# tree hash for "/"
content = b'100644 flag.txt\x00' + bytes.fromhex(file_hash)
content = f'tree {len(content)}\x00'.encode() + content
tree_hash = hashlib.sha1(content).hexdigest()
# 661c9ebfb47590de9c094b3c56f7700050731b36
# Version 2
# blob hash for "/flag.txt"
content = b'union{*****************_****************************}\n'
content = f'blob {len(content)}\x00'.encode() + content
file_hash = hashlib.sha1(content).hexdigest()
# f0a9b306519600289bb5d7ed5984babae61cf3b8
# tree hash for "/"
content = b'100644 flag.txt\x00' + bytes.fromhex(file_hash)
content = f'tree {len(content)}\x00'.encode() + content
tree_hash = hashlib.sha1(content).hexdigest()
# 754603fb1051aecc022315196611374f65791906
# Version 3
# blob hash for "/flag.txt"
content = b'union{*******3*********_************r****d**********}\n'
content = f'blob {len(content)}\x00'.encode() + content
file_hash = hashlib.sha1(content).hexdigest()
# b3f1331849f82c64f3d164d65891ab44e3662858
# tree hash for "/"
content = b'100644 flag.txt\x00' + bytes.fromhex(file_hash)
content = f'tree {len(content)}\x00'.encode() + content
tree_hash = hashlib.sha1(content).hexdigest()
# efc6902a4e82e09282b6a3b71b61bf8cccfe6689
Part II: Finding the commit hash⌗
The commit hashes are relatively easier to compute. Although the commits generally form a tree, the commits in the challenge form a chain. This is because there are no merge commits within, where there would be multiple parents for a commit.
There are five fields to be filled to compute the commit hash: tree hash, parent hashes (if exists), author, committer and the commit message. With further ado, let's see how the commit hashes are calculated.
# Version 1
content = b'tree 661c9ebfb47590de9c094b3c56f7700050731b36\n'
content += b'author Robert J. Lawful <boblaw@legal.committee> 1583323200 +0000\n'
content += b'committer Flag-deciding Committee <committee@legal.committee> 1583323200 +0000\n'
content += b'\n'
content += b'Initial formation of the flag-deciding committee.\n' # The commit message
content = f'commit {len(content)}\x00'.encode() + content
commit_hash = hashlib.sha1(content).hexdigest()
# 08e1f0dd3b9d710b1eea81f6b8f76c455f634e87
# Version 2
content = b'tree 754603fb1051aecc022315196611374f65791906\n'
content += b'parent 08e1f0dd3b9d710b1eea81f6b8f76c455f634e87\n'
content += b'author Pamela W. Mathews <pammy.emm@legal.committee> 1584102600 +0000\n'
content += b'committer Flag-deciding Committee <committee@legal.committee> 1584102600 +0000\n'
content += b'\n'
content += b'Proceedings of the flag-deciding committee: 18\n' # The commit message
content = f'commit {len(content)}\x00'.encode() + content
commit_hash = hashlib.sha1(content).hexdigest()
# c3e6c8ea777d50595a8b288cbbbd7a675c43b5df
# Version 3
content = b'tree efc6902a4e82e09282b6a3b71b61bf8cccfe6689\n'
content += b'parent c3e6c8ea777d50595a8b288cbbbd7a675c43b5df\n'
content += b'author Christopher L. Hatch <crisscross.the.hatch@legal.committee> 1584966600 +0000\n'
content += b'committer Flag-deciding Committee <committee@legal.committee> 1584966600 +0000\n'
content += b'\n'
content += b'Proceedings of the flag-deciding committee: 8, 31, 36\n' # The commit message
content = f'commit {len(content)}\x00'.encode() + content
commit_hash = hashlib.sha1(content).hexdigest()
# dca4ca5150b82e541e2f5c42d00493ba8d4aa84a
Part III: Back to the challenge⌗
From the piece above, it is pretty evident to solve the challenge. What we need to do is to guess the committed bytes (the indices are given in the commit message). We are able to compute the commit hash and check if it is correct. This is my final solve script. Running the program for a minute or so, we have the flag 🎉!
union{c0mm1tt33_d3c1deD_bu7_SHA_d3t3rm1n3d_6a7c2619a}