Dragon CTF 2021: CRC Recursive Challenge (Warmup)

@blackb6a played Dragon CTF 2021 last weekend and I spent most of the time solving the CRC duo. They are very fun but unfortunately that we were close enough while unable to get the second flag.

Warmup edition only. If you are interested to learn the harder edition that involves a 128-bit CRC, you may want to look at hellman’s writeup. I would only cover a solution of the easier edition with some unsuccessful attempts of the harder one.

Challenge Summary⌗

The objective is to find a message which correctly depicts its CRC-64 digest.

def crc64(buf, crc=0xffffffffffffffff):
    for val in buf:
        crc ^= val << 56
        for _ in range(8):
            crc <<= 1
            if crc & 2**64:
                crc ^= 0x1ad93d23594c935a9
    return crc

inp = input().strip().encode()
crc = crc64(inp)
if inp == f'My crc64 is 0x{crc:016x}! Cool, isn\'t it?'.encode():
    with open('flag.txt', 'r') as f:
        print(f.read().strip())
else:
    print('Nope!')

Solution⌗

The crypto challenges in Dragon CTF are often easy to understand but turns out hard to solve. The Bit Flip trio last year is one of them. During the contest, we thought that we could easily write an algorithm solving the harder problem. Turns out we did not solve the harder challenge in time.

Part I: Pay to win with an $\mathcal{O}(2^{n/2})$ algorithm⌗

To start with, I will include the affine property of CRC. That is the most important property that I used to attack the challenge.

CRC’s affine property. For any messages $m_1, m_2, m_3$,

$$\text{CRC}(m_1 \oplus m_2 \oplus m_3) = \text{CRC}(m_1) \oplus \text{CRC}(m_2) \oplus \text{CRC}(m_3).$$

Let's define some bytestrings in Python and rewrite the objective mathematically. Note that we will use _ to represent a null byte simply because it looks better.

# Note: This is not the final value
ks = [0 for _ in range(16)]


m0         = b"My crc64 is 0x________________! Cool, isn't it?"

dm[ 0][ 0] = b"_____________________________0_________________"
dm[ 0][ 1] = b"_____________________________1_________________"
# ...
dm[ 0][15] = b"_____________________________f_________________"
dm[ 1][ 0] = b"____________________________0__________________"
# ...
dm[ 1][15] = b"____________________________f__________________"
# ...
dm[15][ 0] = b"______________0________________________________"
# ...
dm[15][15] = b"______________f________________________________"

Let $m_0$ be m0, $k_i$ be ks[i], $\Delta m_{i, j}$ be dm[i][j]. Our goal is to find $k_i$'s with

\[\text{CRC}(m_0 \oplus \Delta m_{0, k_0} \oplus \Delta m_{1, k_1} \oplus ... \oplus \Delta m_{15, k_{15}}) = k_0 \oplus 16 k_1 \oplus ... \oplus 16^{15} k_{15}.\]

With the affine property, we can show that

\[\begin{aligned} & k_0 \oplus 16 k_1 \oplus ... \oplus 16^{15} k_{15} \\ & \qquad = \text{CRC}(m_0 \oplus \Delta m_{0, k_0} \oplus \Delta m_{1, k_1} \oplus ... \oplus \Delta m_{15, k_{15}}) \\ & \qquad = \text{CRC}(m_0) \oplus \text{CRC}(\Delta m_{0, k_0}) \oplus \text{CRC}(\Delta m_{1, k_1}) \oplus ... \oplus \text{CRC}(\Delta m_{15, k_{15}}) \end{aligned}\]

We can rearrange the terms on the both sides to see my intention:

\[\begin{aligned} & \text{CRC}(m_0) \oplus \left(k_0 \oplus \text{CRC}(\Delta m_{0, k_0})\right) \oplus ... \oplus \left(16^7 k_7 \oplus \text{CRC}(\Delta m_{7, k_7})\right) \\ & \qquad = \left(16^8 k_8 \oplus \text{CRC}(\Delta m_{8, k_8})\right) \oplus ... \oplus \left(16^{15} k_{15} \oplus \text{CRC}(\Delta m_{15, k_{15}})\right) \end{aligned}\]

In that way, $k_0, ..., k_7$ are on the left hand side and the rest are on the right. We can perform meet-in-the-middle attack. In that way we need to precompute and store every output from the left, hence maintaining $16^{8}$ entries. After precomputing, we perform $16^{8}$ queries to check if an entry from the right has its corresponding entries from the left... Well, we need around 256GB of memory to store the results in a map. Is this even feasible? Anyway, if we do not have too much memory, we can sacrifice time for space:

\[\begin{aligned} & \text{CRC}(m_0) \oplus \left(k_0 \oplus \text{CRC}(\Delta m_{0, k_0})\right) \oplus ... \oplus \left(16^4 k_4 \oplus \text{CRC}(\Delta m_{4, k_4})\right) \\ & \qquad = \left(16^5 k_5 \oplus \text{CRC}(\Delta m_{5, k_5})\right) \oplus ... \oplus \left(16^{15} k_{15} \oplus \text{CRC}(\Delta m_{15, k_{15}})\right). \end{aligned}\]

We only have to precompute and store $16^5$ entries. It is now a matter of time as we need $16^{11}$ queries. Let's use $u + v$ to represent the setting that precomputes and stores the entries using $k_1, k_2, ..., k_u$ and generates queries with the remaining $v$ variables. The two settings we mentioned are $8 + 8$ and $5 + 11$.

I estimated that $u + v$ requires $16^{u-6}$ GB of memory and takes $3 \cdot (16^{u-8} + 16^{v-8})$ hours to finish. The following table summarizes the memory and time:

Setting	Estimated Memory	Estimated Time
$5 + 11$	64MB	1 year
$6 + 10$	1GB	1 month
$7 + 9$	16GB	2 days
$8 + 8$	256GB	6 hours
$9 + 7$	4TB	2 days
$10 + 6$	64TB	1 month
$11 + 5$	too much	1 year

My laptop can share 1GB memory but I don't have a month running the program for the flag. It seemed impossible, right? @harrier_lcc has the best laptop among us and can deploy $7 + 9$ on his laptop - but it still require two days to complete. Here comes the two things we learned:

One can use multiple threads to speed up⌗

To speed up the whole process, we used multithreading during the lookup phase. If we have $t$ threads, then the expected running time is $3 \cdot (16^{u-8} + 16^{v-8}/t)$ hours. Running a $7 + 9$ with 16 threads would take around 3 hours. Regrettably it is not a good idea to freeze @harrier_lcc's laptop for a couple hours.

Why not multithreading for precompute as well? Map insert is not thread-safe and some entries may not be successfully inserted to the map.

One can do many things with money⌗

Eventually, we spun up a 48-core machine (r6g.12xlarge) on Amazon EC2. We streamed the process on our Discord server at 6AM and everyone is watching.

Burn money burn. Screenshot taken by @grhkm2023.

It took around 30 minutes for the program to print a candidate 0x7a8abd9a85eed3b9. After ten hours we finally get ourselves warmed up, yay! The solve script is at the end of the post.

nc crc.hackable.software 1337
My crc64 is 0x7a8abd9a85eed3b9! Cool, isn't it?
DrgnS{0x5d9f900f6564617b}

Part II: When in doubt use LLL (and led to more doubts)⌗

Ideas only. We will be discussing unsuccessful attempts to tickle the harder challenge from this part onwards.

With a similar idea adopted in Tenet: The Plagarism in HKCERT CTF 2021, I converted the 16 unknowns $k_0, k_1, ..., k_{15} \in \mathbb{Z}_{16}$ to 256 unknowns $x_{0,0}, x_{0,1}, ..., x_{0,15}, x_{1,0}, ..., x_{15,15} \in \{0, 1\}$. We can see that the below equation

\[k_0 \oplus 16 k_1 \oplus ... \oplus 16^{15} k_{15} = \text{CRC}(m_0) \oplus \text{CRC}(\Delta m_{0, k_0}) \oplus \text{CRC}(\Delta m_{1, k_1}) \oplus ... \oplus \text{CRC}(\Delta m_{15, k_{15}})\]

is equivalent to the below equation when $x_{i, j} = 1$ if $k_i = j$ ($x_{i, j} = 0$ otherwise):

\[\bigoplus_{i=0}^{15} \bigoplus_{j=0}^{15}\ 16^i j \cdot x_{i,j} = \text{CRC}(m_0) \oplus \bigoplus_{i=0}^{15} \bigoplus_{j=0}^{15}\ x_{i,j} \text{CRC}(\Delta m_{i,j})\]

Rearranging the terms:

\[\bigoplus_{i=0}^{15} \bigoplus_{j=0}^{15}\ \left(16^i j \oplus \text{CRC}(\Delta m_{i,j})\right) \cdot x_{i,j} = \text{CRC}(m_0)\]

We now have 64 linear equations under $\mathbb{Z}_2$ when looking at the equation bit-by-bit. Since the system is largely underdetermined, it is not feasible to solve the linear system. We tried LLL instead. Define the $257 \times 64$ matrix $B$ (zero-indexed):

\[B_{p, q} = \begin{cases} q^\text{th}\ \text{bit of}\ \left(16^i j \oplus \text{CRC}(\Delta m_{i,j})\right) & \text{for}\ p = 16i + j\ \text{and}\ 0 \leq i, j < 16 \\ q^\text{th}\ \text{bit of}\ \text{CRC}(m_0) & \text{for}\ p = 256 \end{cases}.\]

We further define the matrix $A$ which we will use for LLL:

\[A = \left[\begin{matrix} I_{257} & B \\ O & 2 \cdot I_{64} \end{matrix}\right].\]

It is evident that $(x_{0,0}, x_{0,1}, ..., x_{15,15}, 1, 0, ..., 0)$ is spanned by the lattice. Sadly, the vector did not appear. Since we did not have high expectations on this approach, we did not investigate this any further.

Part III: An approach with time complexity $\mathcal{O}(2^{n/4})$⌗

TL;DR. We will discuss a solution that works on a modest computer… Only for the CRC64 challenge though.

The largest hurdle of this challenge is the non-affinity between ASCII characters and numbers. Although there are chances that the XOR value matches, such as:

b"1337" xor b"2345" xor b"2021" == b"1053" and 0x1337 ^ 0x2345 ^ 0x2021 == 0x1053.

There are counter examples like:

b"9123" xor b"1204" xor b"3054" == b";373" and 0x9123 ^ 0x1204 ^ 0x3054 == 0xb373.

To handle this, we are going to redefine almost everything in the first path. Let's forget them and look (_ still represents the null byte, however):

# Note: This is not the final value
ks = [0 for _ in range(16)]


m0         = b"My crc64 is 0x                ! Cool, isn't it?"

# 1, 2, 4, 8, G and @ respectively represent
#     b'\x01', b'\x02', b'\x04', b'\x08', b'\x10' and b'\x40'.
dm[ 0][ 0] = b"_____________________________1_________________"
dm[ 0][ 1] = b"_____________________________2_________________"
dm[ 0][ 2] = b"_____________________________4_________________"
dm[ 0][ 3] = b"_____________________________8_________________"
dm[ 0][ 4] = b"_____________________________G_________________"
dm[ 0][ 5] = b"_____________________________@_________________"
dm[ 1][ 0] = b"____________________________1__________________"
# ...
dm[15][ 5] = b"______________@________________________________"

We are going to introduce 160 unknowns: $x_{ip}, y_{iq} \in \{0, 1\}$ for $0 \leq i \lt 16$, $0 \leq p \leq 5$ and $0 \leq q \leq 3$. We have the first equation (the right hand side is shown in its binary representation):

\[\begin{aligned} & \text{CRC}(m_0 \oplus x_{0,0} \Delta m_{0,0} \oplus x_{0,1} \Delta m_{0,1} \oplus ... \oplus x_{15,5} \Delta m_{15,5}) \\ & \qquad = \overline{y_{15,3}y_{15,2}y_{15,1}y_{15,0}y_{14,4}...y_{0,1}y_{0,0}} \qquad\qquad\qquad\qquad[\dagger] \end{aligned}\]

An intuitive meaning for $x_{ip}$ and $y_{iq}$ are respectly the bits of the ASCII value and the bits of the number itself. Two questions, however:

Why we are skipping the fifth bit and the seventh bit when defining $\Delta m_{i, j}$?
Why did we change $m_0$ from 0x________________ to 0x ?

We are using the set 0123456789abcdef to represent the hexadecimal characters. The following table shows their ASCII values in binary and we can see that bits 5 and 7 are respectively always 1 and 0. That said, we can put bit 5, \x20, into $m_0$ and drop those bits from unknown.

       | Digits     Letters
       | 0123456789 abcdef
-------+-------------------
 Bit 0 | 0101010101 101010
 Bit 1 | 0011001100 011001
 Bit 2 | 0000111100 000111
 Bit 3 | 0000000011 000000
 Bit 4 | 1111111111 000000
 Bit 5 | 1111111111 111111  👈 All of the entries are 1
 Bit 6 | 0000000000 111111
 Bit 7 | 0000000000 000000  👈 All of the entries are 0

Anyway, we can recover 64 equations in $\mathbb{Z}_2$ if we consider $[\dagger]$ bit-by-bit. Since there are 160 unknowns, there are still 96 free variables (possibly $2^{96}$ solutions). How to eliminate them? Let's take a look at the $\spadesuit$-th character and relate $x_{\spadesuit p}$'s and $y_{\spadesuit q}$'s. There are two cases:

If the $\spadesuit$-th hex character is a digit...
- $x_{\spadesuit 0} \oplus y_{\spadesuit 0} = 0$
- $x_{\spadesuit 1} \oplus y_{\spadesuit 1} = 0$
- $x_{\spadesuit 2} \oplus y_{\spadesuit 2} = 0$
- $x_{\spadesuit 3} \oplus y_{\spadesuit 3} = 0$
- $x_{\spadesuit 4} = 1$
- $x_{\spadesuit 5} = 0$ (note that this represents the 6-th bit)
If the $\spadesuit$-th hex character is a letter...
- $x_{\spadesuit 0} \oplus y_{\spadesuit 0} = 1$
- $x_{\spadesuit 1} \oplus y_{\spadesuit 0} \oplus y_{\spadesuit 1} = 1$
- $x_{\spadesuit 3} = 0$
- $x_{\spadesuit 4} = 0$
- $x_{\spadesuit 5} = 1$
- $y_{\spadesuit 3} = 1$

From above, we can obtain 6 equations if we know whether a hex character is a digit or a letter. After all, assuming that we know if each of the hex characters are a digit or a letter, we have 160 equations and 160 unknowns. Therefore we are able to determine $x_{ip}$'s and $y_{iq}$'s by solving a linear system.

We don't know if the characters are digits or letters. We can simply guess them - there are 16 hex characters and we can exhaust all of the $2^{16}$ cases. I have written the solve script in Sagemath. It takes only 6 minutes to solve the easier version.

Now we have devised a nice algorithm that takes only $2^{32}$ Gaussian elimination for the harder part. Maybe we can try running that?

Just optimize it. After the contest, we learned that the author’s solution is to solve $2^{32}$ linear systems (with optimized C++ implementation and smaller matrices though). So I am not that far from the flag…

Solve script⌗

The meet-in-the-middle approach⌗

#include <cstdio>
#include <cstring>
#include <map>
#include <pthread.h>
#define ull unsigned long long
using namespace std;

ull crc64(char *buf, size_t size) {
    ull res = -1;
    for (int i = 0; i < size; i++) {
        res ^= (ull) buf[i] << 56;
        for (int j = 0; j < 8; j++) {
            if (res & 0x8000000000000000) {
                res <<= 1;
                res ^= 0xad93d23594c935a9LL;
            } else {
                res <<= 1;
            }
        }
    }
    return res;
}


ull h0;
ull progress = 0;
ull res[16][16];
map<ull, ull> lhs;
void fill_lhs(int i, ull state, ull r) {
    if (i == 7) {
        // progress
        progress += 1;
        if (!(progress & 0xfffff)) printf("[ ] Filling left hand side... %07llx\n", progress);

        lhs[state] = r;
        return;
    }

    for (int j = 0; j < 16; j++) {
        fill_lhs(i+1, state ^ res[i][j], 16*r+j);
    }
}

void exhaust_rhs(int i, ull state, ull r) {
    if (i == 9) {
        if (lhs.find(state) != lhs.end()) {
            ull l = lhs.find(state)->second;
            printf("[!] Found! %016llx %016llx\n", l, r);
        }
        return;
    }

    for (int j = 0; j < 16; j++) {
        exhaust_rhs(i+1, state ^ res[i+7][j], 16*r+j);
    }
    return;
}

void *rhs_entry(void *arg) {
    ull *id = (ull *) arg;
    exhaust_rhs(2, res[7][*id / 16] ^ res[8][*id % 16], *id);
    return NULL;
}

int main () {
    pthread_t threads[256];
    ull args[256];
    for (ull i = 0; i < 256; i++) args[i] = i;

    setvbuf(stdout, NULL, _IONBF, 0);
    
    char u[] = "My crc64 is 0x0000000000000000! Cool, isn't it?";
    for (int i = 0; i < 16; i++) u[i+14] = 0;
    
    h0 = crc64(u, 47);

    memset(res, 0, sizeof res);

    char hexchars[] = "0123456789abcdef";
    for (int i = 0; i < 16; i++) {
        for (int j = 0; j < 16; j++) {
            memset(u, 0, 47);
            u[14+i] = hexchars[j];
            res[i][j] = crc64(u, 47) ^ ((ull) j << (4 * (15-i)));
        }
    }

    printf("[*] Precomputing...\n");
    fill_lhs(0, h0, 0);

    printf("[*] Looking up...\n");
    for (int i = 0; i < 256; i++) pthread_create(&threads[i], NULL, rhs_entry, &args[i]);
    for (int i = 0; i < 256; i++) pthread_join(threads[i], NULL);

    return 0;
}

The solving-the-linear-system approach⌗

import itertools
from tqdm import tqdm

def _crc64(buf, crc=0xffffffffffffffff):
    for val in buf:
        crc ^^= val << 56
        for _ in range(8):
            crc <<= 1
            if crc & 2**64:
                crc ^^= 0x1ad93d23594c935a9
    return crc

def _crc128(buf, crc=0xffffffffffffffffffffffffffffffff):
    for val in buf:
        crc ^^= val << 120
        for _ in range(8):
            crc <<= 1
            if crc & 2**128:
                crc ^^= 0x14caa61b0d7fe5fa54189d46709eaba2d
    return crc

# SOLUTION

n = 64 # Change this to 128 for the harder version.

m_prefix = f'My crc{n} is 0x'.encode()
m_suffix = b'! Cool, isn\'t it?'
m_hidden = b' ' * int(n//4)

crc = {64: _crc64, 128: _crc128}[n]

h0 = crc(m_prefix + m_hidden + m_suffix)
h1 = crc(b'\0' * len(m_prefix + m_hidden + m_suffix)) # A parity for affinity

dms, dhs = [], []
# Consider each hex char each
for i in range(n//4-1, -1, -1):
    # Bits 0 and 2 are skipped because they are always 0 and 1, respectively.
    for j in [0, 1, 2, 3, 4, 6]:
        block = b'\0'*i + bytes([1<<j]) + b'\0'*(n//4-1-i)
        dm = b'\0'*len(m_prefix) + block + b'\0'*len(m_suffix)
        dh = crc(dm)
        
        dms.append(dm)
        dhs.append(dh)

A = Matrix(GF(2), int(5*n//2+1), int(5*n//2+1))
b = vector(GF(2), int(5*n//2+1))

for i in range(n):
    A[i, i]       = 1
    for j in range(3*n//2):
        A[i, j+n] = dhs[j]>>i
    A[i, 5*n//2]  = h1>>i
    b[i]          = h0>>i

# Add more known equations
m = n
for i in range(n//4):
    # x4 + x5 = 1
    A[m, n + 6*i + 4] = 1
    A[m, n + 6*i + 5] = 1
    b[m]              = 1

    m += 1

for i in range(n//4):
    # x0 + x4 + y0 = 1
    A[m, n + 6*i + 0] = 1
    A[m, n + 6*i + 4] = 1
    A[m, 4*i + 0]     = 1
    b[m]              = 1

    m += 1

if True:
    # Parity for CRC
    for j in range(3 * n//2):
        A[m, n + j] = 1
    A[m, 5*n//2]    = 1
    b[m]            = 0

    m += 1

m0 = m

for t in tqdm(itertools.product([0, 1], repeat=int(n//4)), total=int(2**int(n//4))):
    m = m0
    
    A[m:m+4, :] = 0

    for i in range(n//4):    
        if t[i] == 0:
            # x1 + y1 = 0
            A[m + 0, n + 6*i + 1] = 1
            A[m + 0,     4*i + 1] = 1
            b[m + 0]              = 0
            # x2 + y2 = 0
            A[m + 1, n + 6*i + 2] = 1
            A[m + 1,     4*i + 2] = 1
            b[m + 1]              = 0
            # x3 + y3 = 0
            A[m + 2, n + 6*i + 3] = 1
            A[m + 2,     4*i + 3] = 1
            b[m + 2]              = 0
            # x4 = 1
            A[m + 3, n + 6*i + 4] = 1
            b[m + 3]              = 1
        else:
            # x1 + y0 + y1 = 1
            A[m + 0, n + 6*i + 1] = 1
            A[m + 0,     4*i + 0] = 1
            A[m + 0,     4*i + 1] = 1
            b[m + 0]              = 1
            # y3 = 1
            A[m + 1,     4*i + 3] = 1
            b[m + 1]              = 1
            # x3 = 0
            A[m + 2, n + 6*i + 3] = 1
            b[m + 2]              = 0
            # x4 = 0
            A[m + 3, n + 6*i + 4] = 1
            b[m + 3]              = 0
        
        m += 4
    
    try:
        x0 = A.solve_right(b)
    except:
        continue

    for dx in A.right_kernel():
        x = x0 + dx
        res = sum([int(b)<<i for i, b in enumerate(x[:n])])
        _m = m_prefix + format(res, f'0{n//4}x').encode() + m_suffix
        _h = crc(_m)
        
        if _h == res:
            print(f'Found - {_m}')