I played ångstromCTF 2021 for @blackb6a to spend my Easter holiday. I solved most of the reverse and cryptography challenges alone. In particular, Cache Money is one of the harder crypto challenges that I spent more than one day dealing with. It is very rewarding, and eventually four teams ended up solving it.

## Challenge Summary⌗

This challenge reimplements the Advanced Encryption Standard (AES) on 128, 192 and 256-bit keys. The encryptor is equipped with caches and we are given a service to encrypt (or decrypt) our messages. In short, there are four oracles provided by the service ($k_0$ is the fixed secret key and $b \in \{128, 192, 256\}$).

• $\mathcal{E}(m, k)$ encrypts a message $m$ with $k$,
• $\mathcal{D}(c, k)$ decrypts a ciphertext $c$ with $k$,
• $\mathcal{E}_0(m, b)$ encrypts a message $m$ with the first $b$ bits of $k_0$, are
• $\mathcal{D}_0(c, b)$ decrypts a ciphertext $c$ with the first $b$ bits of $k_0$.

Apart from the ciphertext (resp. plaintext), each oracle returns also the amount of time, in nanoseconds, used for encryption (resp. decryption).

Additionally, there is an flag $m_0$ encrypted with the $k_0$ under AES-256. The objective is of course decrypt and retrieve the flag. Needless to mention, they had measures to avoid the flag being retrieved by calling $\mathcal{D}_0(c_0, 256)$ directly. At least I am not awared of a way to solve this without recovering the key.

## Solution⌗

### Part I: What is that cache?⌗

#### The caching strategy⌗

From the challenge, there is a Rijndael class that encrypts and decrypts messages. All of the functions inside the class are decorated with @instance_method_cache and they will call the cache method defined below:

def instance_method_cache(f):
# lru_cache uses the same cache for every instance which kinda sucks
# so i made my own cache implementation
def wrapper(self, *args):
key = (f.__name__, *args)
try:
return self.cache[key][0]
except TypeError:
# deal with unhashable arguments
return f(self, *args)
except KeyError:
# remove the least recently used element
# dont want the cache blowing up
while len(self.cache.keys()) >= 300:
mintimestamp = timestamp()
keys = list(self.cache.keys())
lowest = -1
for i in range(len(keys)):
if self.cache[keys[i]][1] < mintimestamp:
mintimestamp = self.cache[keys[i]][1]
lowest = i
else:
continue
del self.cache[keys[lowest]]
ret = f(self, *args)
self.cache[key] = (ret, timestamp())
return ret

return wrapper

Suppose we want to call $\mathcal{F}(x)$. This is how the caching works:

1. If $(\mathcal{F}, x)$ is cached, return the cached value of $\mathcal{F}(x)$.
2. Drop the oldest entry from the cache until there are less than 300 entries.
3. Compute $y := \mathcal{F}(x)$, cache $y$ as the value to $(\mathcal{F}, x)$ and return it.

#### Caching in action: Encrypting a dummy flag with a dummy key⌗

I added a hook to log the entries and see how the function calls are triggered:

-   self.cache[key] = (ret, timestamp())
+   t2 = timestamp()
+   val = (ret, t2)
+   self.cache[key] = val
+   print(f'[+] self.cache[{key}] = {val} (time spent: {t2 - t1})')

This is a transcript that encrypts actf{this_is_a_test_flag} with the key being Tes3UeYGqoq6vatPmlM6uQrTA7OTLNP8. There are two parts:

1. (Lines 1-369) Initializes the Rijndael class with the key.
2. (Lines 375-5690) Encrypts the message.

For the oracles, only second part will be timed.

### Part II: Obviously a timing attack, but how?⌗

It takes around 0.7 seconds to encrypt a random block of message under AES-256. A costy operation would be MixColumns - it is called 13 times and each contributes ~6% to the total encryption time (hence around 85% in total). Unfortunately, it is not feasible to read cached MixColumns for the timing attack. Here's why:

Don’t believe that 0.7s. The time is measured with there is verbose logging, and I/O definitely takes a considerable amount of time. It takes only 0.2s without logging (which is still pretty slow by the way).
1. Suppose that $m_1, m_2, ..., m_{13}$ are the inputs for MixColumns. It is not likely to find $m_i = m_j$ for some distinct pairs $(i, j)$.
2. The time fluctuates a lot and it is not reliable measuring a 6% difference.

Well, I think timing attack is difficult solely with the second point. There are many function calls triggered by a single encrypt_block, including:

• ~1.8K Add calls,
• 15 of them add two 4×4 matrices (~5ms each),
• 60 of them add two array of 4 entries (~1ms each),
• the remainder adds two numbers (~0.2ms each).
• ~380 Multiply calls (~1ms each),
• ~350 Double calls (~0.2ms each),
• 56 SubWord calls (~0.2ms each),
• 14 ShiftRows calls (~0.2ms each), and
• 13 MixColumns calls (~40ms each).

That is a lot of uncertainty. It would be much better if we are able to determine whether a single Add operation is cached... Is it even possible?

I had no idea. The challenge author released a hint few hours later:

Hint 1: This is also partially a web challenge...look for something you can exploit to make things more consistent.

There must be something fishy being a web challenge. Let's look at the source code for the encryption endpoint:

@instance_method_cache
def encrypt(self, m):
try:
return b"".join(
self.encrypt_block(m[i:][:16]) for i in range(0, len(m), 16)
)
except:
return b"Error"

@app.route("/api/enc", methods=["POST"])
def encrypt():
m = tuple(request.json["a"])
# Does not pad if the length of message is a multiple of 16
if len(m) % 16 != 0:
m = m + (0,) * (16 - (len(m) % 16))
if request.json["secret"] not in ["secret128", "secret192", "secret256"]:
k = tuple(request.json["k"])
else:
# Takes the first n bytes of the secret if we are using k0 for encryption.
k = key[
: {"secret128": 16, "secret192": 24, "secret256": 32}[
request.json["secret"]
]
]
try:
cipher = Rijndael(k)
except ValueError:
return jsonify({"result": "Invalid key", "time": 0})
start = timer()
enc = cipher.encrypt(m)
end = timer()
del cipher
return jsonify({"result": enc.hex(), "time": end - start})

Note that if we are encrypting the message block actf{this_is_a_t with $k_0$ under AES-256, then this is the JSON object we send:

// Request
{
"a": [97, 99, 116, 102, 123, 116, 104, 105, 115, 95, 105, 115, 95, 97, 95, 116],
"secret": "secret256"
}

// Response
{
"result": "83696d00a4e4def3a9ca1c3e61b899cb",
"time": 144329411
}

What if we send a malformed a? The exception is raised in a lower level function and is handled by encrypt(self, m). It is good to see the time measured.

{"a": [[]], "secret": "secret256"} // Request
{"result": "4572726f72", "time": 45599} // Response ("4572726f72" is Error hexed)

Currently we are measuring nothing, since exception is raised before the first function call is completed. Let's throw after the first function call:

{"a": [1337, []], "secret": "secret256"} // Request
{"result": "4572726f72", "time": 189082} // Response

By throwing errors in an early stage, the time differences are more significant than the fluctuations now. Nice!

### Part III: Leaking one eighth of the key⌗

#### A drop in response time⌗

As mentioned above, the first 369 lines from my transcript is where the Rijndael instance being initialized. Lines 1-55 is the log for computing the round constants rcon. Lines 56-369 represents how the round keys are computed, and this is part of the transcript:

56:[+] self.cache[('RotWord', (76, 78, 80, 56))] = ((78, 80, 56, 76), 505349375647612, 8617) (time spent: 8617)
57:[+] self.cache[('SubWord', (78, 80, 56, 76))] = ((47, 83, 7, 41), 505349375732640, 14356) (time spent: 14356)
58:[+] self.cache[('Add', 47, 1)] = (46, 505349375825251, 9545) (time spent: 9545)
59:[+] self.cache[('Add', 83, 0)] = (83, 505349375902264, 9931) (time spent: 9931)
60:[+] self.cache[('Add', 7, 0)] = (7, 505349375978725, 9218) (time spent: 9218)
61:[+] self.cache[('Add', 41, 0)] = (41, 505349376055553, 9776) (time spent: 9776)
62:[+] self.cache[('Add', (47, 83, 7, 41), (1, 0, 0, 0))] = ((46, 83, 7, 41), 505349376123267, 320106) (time spent: 320106)
63:[+] self.cache[('Add', 84, 46)] = (122, 505349376211788, 7671) (time spent: 7671)
64:[+] self.cache[('Add', 101, 83)] = (54, 505349376306808, 11562) (time spent: 11562)
65:[+] self.cache[('Add', 115, 7)] = (116, 505349376404211, 12700) (time spent: 12700)
66:[+] self.cache[('Add', 51, 41)] = (26, 505349376486636, 9646) (time spent: 9646)
67:[+] self.cache[('Add', (84, 101, 115, 51), (46, 83, 7, 41))] = ((122, 54, 116, 26), 505349376555581, 361347) (time spent: 361347)
68:[+] self.cache[('Add', 85, 122)] = (47, 505349376655242, 16317) (time spent: 16317)
69:[+] self.cache[('Add', 101, 54)] = (83, 505349376735919, 10524) (time spent: 10524)
...

AES-128 and AES-192 have less than 300 function calls computing the round keys, thus every single step taken for round keys is cached. This is not the case for AES-256. Therefore, we should attempt to leak something from $k_0$ via $\mathcal{E}_0(m, 128)$ or $\mathcal{E}_0(m, 192)$. To achieve this, let's try to make use of the error-throwing strategy we discussed in the last part.

{"a": [1337, []], "secret": "secret128"} // Request
{"result": "4572726f72", "time": 159767} // Response

Iterating 0 <= a[0] < 256, it is observed the request a[0] = 40 is particularly faster than the others.

{"a": [40, []], "secret": "secret128"} // Request
{"result": "4572726f72", "time": 40721} // Response

#### What is this? Why is it faster?⌗

🤨 Confusion alert. I know it is getting more confusing. Please bear with me…

In short, Add(???, 40) is cached... But what is that? To understand why, look at the constructor and the encrypt_block method:

def encrypt_block(self, block):
m = tuple(tuple(block[4 * i :][:4]) for i in range(4))
m = self.Add(self.roundkeys[:4], m)
# ... omitted ...

This is how the methods called during encrypt_block, in order:

encrypt_block(block)
->  [...  Computes m <- block  ...]
->  Add(roundkeys[:4], m)
->  Add(roundkeys[0], m[0])
->  Add(roundkeys[0][0], m[0][0]) [... 💰 Uses the cache ...]
->  Add(roundkeys[0][1], m[0][1]) [... 🔥 Throws an error ... ]

[... The subsequent functions are not called ...]

->  Add(roundkeys[0][2], m[0][2])
->  Add(roundkeys[0][3], m[0][3])
->  Add(roundkeys[1], m[1])
->  ...

Therefore, that Add(???, 40) cached is actually Add(roundkeys[0][0], m[0][0]). With the same strategy, we can find a[1], a[2] and a[3] that the API calls has a smaller response time: 67, 82, 56.

{"a": [40, 67, 82, 56, []], "secret": "secret128"} // Request
{"result": "4572726f72", "time": 40665} // Response

What does (40, 67, 82, 56) mean? From the cache's perspective, the function call Add(roundkeys[0], (40, 67, 82, 56)) is cached. But why is it cached? Looking back at the constructor, this is executed when the round keys are computed:

Add(w[0], Add(SubWord(RotWord(w[3])), rcon[1]))

Hereby w[0] and w[3] are respectively roundkeys[0] and roundkeys[3], while rcon[1] == (1, 0, 0, 0). That said, it is very likely that (40, 67, 82, 56) is being the second argument, i.e.:

Add(SubWord(RotWord(roundkeys[3])), rcon[1]) == (40, 67, 82, 56)

# eqivalently...

sbox(roundkeys[3][1]) ^ 1 == 40
sbox(roundkeys[3][2])     == 67
sbox(roundkeys[3][3])     == 82
sbox(roundkeys[3][0])     == 56

# and equivalently...

roundkeys[3][1] ==  76
roundkeys[3][2] == 100
roundkeys[3][3] ==  72
roundkeys[3][0] == 118

On top of that, roundkeys[3] == key[12:16]. Therefore we have recovered four bytes of $k_0$. We can further recover the first eight bytes of $k_0$ by testing a[4], a[5], ..., a[11]. Since we have 12 bytes of the key, we can exhaust to recover the secret key used for AES-128 (i.e., key[0:16])

The idea can be used when dealing with AES-192. We are able to recover key[20:24] by finding an appropriate input for a[0], a[1], a[2] and a[3], and eventually key[0:24] by brute-forcing the remaining bytes.

### Part IV: The cache is not in reach for AES-256!⌗

Unfortunately, we cannot generalize the idea to recover keys under AES-256 because there are more than 300 function calls for computing round keys. Hence, the cache for Add(roundkeys[0][0], m[0][0]) and others have already expired prior than they are used. Only the round keys for the later rounds are cached. With that said, we can use $\mathcal{D}_0(c, 256)$ instead of $\mathcal{E}_0(m, 256)$ and recover the round keys generated near-end. Specifically they are roundkeys[49] == a[0:4], roundkeys[50] == a[4:8] and roundkeys[51] == a[8:12].

Why? The proof is left as readers' exercise (I apologize being lazy and I have actually lose patient spending five hours compiling this writeup during CTF).

Since we have roundkeys[0], roundkeys[1], ..., roundkeys[5], we can retrieve key[24:32] using Z3. Eventually we found key == b'fGsOY9Xb4Eq5vLdHzDGbJ4QvQqOCQcYm' and could decrypt the flag: actf{if_you_cache_my_drift}. First 🩸!

🔑 Solution script! The full solution script is available here (and its output). Hope this explains how my attack works.

### Part V: Attacks without time-reading⌗

Note. This is not my idea. I found this pretty mind-blowing so I shamelessly document this as well. Kudos to UnblvR!

In short, one can send a float to determine if an item is cached. One could check if Add(roundkeys[0][0], 1337) is cached by sending the below JSON object:

{"a": [1337.0] "secret": "secret128"} // Request

It happens that an error is thrown if and only if Add(roundkeys[0][0], 1337) is not cached. The cached output will be returned if Add(roundkeys[0][0], 1337) is cached, since ("Add", roundkeys[0][0], 1337) == ("Add", roundkeys[0][0], 1337.0). However, on the other hand, it will attempt to compute roundkeys[0][0] ^ 1337.0. This throws an error since xorring an integer with a float is not a defined behaviour.