# CONFidence 2020 CTF: Team Trees

This week, we have teamed up as @blackb6a to play CONFidence 2020 CTF. We end up ranked 15, but we are more proud of ourselves able to solve a reversing challenge called Team Trees (395 points, 5 solves).

In particular, we are the first-to-solve to the challenge. It took us around two hours to win the flag. This writeup is written by @harrier_lcc and @mystiz613.

## Challenge Summary⌗

We wanted to plant a lot of trees, but it's going kinda slow...

We are given a binary that executes a slow algorithm that takes up a lot of memory. The objective is to wait the program until it is done - the flag would be shown as the output.

**Never.** Our PCs don't even have enough memory to play with that. Even so, we gotta wait forever for the flag. Our objective is to optimize and rewrite the algorithm used. By optimize we mean reduce *both* the time and space complexities.

## Solution⌗

### Part I: Baby steps from baby cases⌗

From the binary, we knew that we are going to find $f(1337)$ for a given $f$. Although we do not know what $f$ is, we could still have an insight on it. For example, we can patch `1337`

by smaller values: `0`

, `1`

, etc.:

```
n = 0: p4{32b9b6bca55548ed88ec405c5c7cf3a1}
n = 1: p4{ee5fadd5a727857f32b9b6bca55548ed}
n = 2: p4{ee5fadd5a727857f982d2435efffdac7}
n = 3: p4{c8b5922a9f156985b4f8094372145c13}
...
n = 4: Out of memory ;_;
```

In particular, `32b9b6bca55548ed`

in the output for $n = 0$ is the `v1`

in `sub_400816`

:

```
void __noreturn sub_400816()
{
unsigned __int64 v0; // rdx
signed __int64 v1; // rcx
signed __int64 v2; // rt0
v0 = 0x82F96AC97429A68BLL;
v1 = 0x32B9B6BCA55548EDLL;
__debugbreak();
while ( 1 )
{
v2 = v1;
v1 = 3 * v0 + 2 * v1 + 4;
v0 = v2;
}
}
```

But what is `88ec405c5c7cf3a1`

? It must be related to `v0`

and `v1`

. We tried `3*v0 + 2*v1 + 4`

and it is `ee5fadd5a727857f`

. It does not check out.

We have spot out that the output shares the same prefix when $n=1$ and $n=2$. Maybe we shall see what is going on with `sub_400816`

- it is simply generating numbers for a sequence, namely $s_n$, indefinitely:

- $s_0 = \text{82F96AC97429A68B}_{16}$,
- $s_1 = \text{32B9B6BCA55548ED}_{16}$, and
- $s_k = 3 s_{k-2} + 2 s_{k-1} + 4\mod2^{64}$ for $k \geq 2$.

Let's generate a bunch of $a_k$'s to see if there is something interesting:

```
s_2 = 0xee5fadd5a727857f
s_3 = 0x74ec7fe13e4ee5c9
s_4 = 0xb4f8094372145c13
s_5 = 0xc8b5922a9f156985
...
```

If we rewrite the program output in terms of $s_k$, we have:

```
n = 0: p4{s_1 88ec405c5c7cf3a1}
n = 1: p4{s_2 32b9b6bca55548ed}
n = 2: p4{s_2 982d2435efffdac7}
n = 3: p4{s_5 s_4}
```

The output when $n = 3$ is actually composed by two elements in the sequence. It makes us think: is it the case that `88ec...`

(also `32b9...`

and `982d...`

) is an *intermediate state*? To verify our thoughts, we are going to dig deeper into the assembly operations.

There are *four* operations. Here are the values of `v0`

and `v1`

(as defined in `sub_400816`

) after each of the steps.

```
/* rcx = c0, rdx = d0 */
lea rdx, [rdx+rdx*2] /* rcx = c0, rdx = 3*d0 */
lea rdx, [rdx+rcx*2+4] /* rcx = c0, rdx = 3*d0 + 2*c0 + 4 */
xchg rcx, rdx /* rcx = 3*d0 + 2*c0 + 4, rdx = c0 */
jmp short loc_40082F /* rcx = 3*d0 + 2*c0 + 4, rdx = c0 */
```

With that said, it takes four instructions for a complete cycle inside `while`

. From the smaller values of $n$, we can see that the output format would be `p4{[rdx][rcx]}`

. Let's see some starting values of `rcx`

and `rdx`

:

```
step | rdx | rcx
------+------------------+------------------
0 | 32b9b6bca55548ed | 82f96ac97429a68b
1 | 32b9b6bca55548ed | 88ec405c5c7cf3a1 <- output when n=0
2 | 32b9b6bca55548ed | ee5fadd5a727857f
3 | ee5fadd5a727857f | 32b9b6bca55548ed <- output when n=1
4 | ee5fadd5a727857f | 32b9b6bca55548ed ...or this?
5 | ee5fadd5a727857f | 982d2435efffdac7 <- output when n=2
6 | ee5fadd5a727857f | 74ec7fe13e4ee5c9
7 | 74ec7fe13e4ee5c9 | ee5fadd5a727857f
8 | 74ec7fe13e4ee5c9 | ee5fadd5a727857f
9 | 74ec7fe13e4ee5c9 | cb1f0980f576907d
10 | 74ec7fe13e4ee5c9 | b4f8094372145c13
11 | b4f8094372145c13 | 74ec7fe13e4ee5c9
12 | b4f8094372145c13 | 74ec7fe13e4ee5c9
13 | b4f8094372145c13 | 5ec57fa3baecb15b
14 | b4f8094372145c13 | c8b5922a9f156985
15 | c8b5922a9f156985 | b4f8094372145c13 <- output when n=3
16 | c8b5922a9f156985 | b4f8094372145c13 ...or this?
17 | c8b5922a9f156985 | 1ee81bca563d1439
18 | c8b5922a9f156985 | b053401f9467e747
19 | b053401f9467e747 | c8b5922a9f156985
```

**Imagination:**You have opened a process with

`gdb`

that has a breakpoint at the beginning of the function. You are given a set of something (*defined in*) and checks if it has a given attribute (

`sub_40090D`

*defined in*). If so, you call

`sub_4009C4`

`ni`

to move to the next step. Finally, you print the registers, extract `edx`

and `ecx`

, and print them as the flag.
So... what is the set of something? And what is the attribute it is checked against?

### Part II: Collecting the pieces for the puzzle⌗

There are four functions that worth investigating: `sub_40090D`

(the enumerator), `sub_4009C4`

(the checker), `sub_400A59`

(the constructor) and `sub_400840`

(called from `sub_40090D`

). We will first look into `sub_400840`

:

```
_DWORD *__fastcall sub_400840(_DWORD *a1)
{
int i; // [rsp+14h] [rbp-Ch]
_DWORD *v3; // [rsp+18h] [rbp-8h]
v3 = malloc(0x20uLL);
if ( !v3 )
{
puts("Out of memory ;_;");
abort();
}
*v3 = *a1;
for ( i = 0; *a1 > i; ++i )
*(_QWORD *)&v3[2 * i + 2] = sub_400840(*(_QWORD *)&a1[2 * i + 2]);
return v3;
}
```

Since it is allocating 32 bytes, we can define a dummy struct:

```
struct Dummy {
char x[32];
};
```

Loading the struct into IDA, by redefining the types of `a1`

and `v3`

as `Dummy*`

, we have:

```
Dummy *__fastcall sub_400840(Dummy *src)
{
int i; // [rsp+14h] [rbp-Ch]
Dummy *dest; // [rsp+18h] [rbp-8h]
dest = (Dummy *)malloc(0x20uLL);
if ( !dest )
{
puts("Out of memory ;_;");
abort();
}
*(_DWORD *)dest->x = *(_DWORD *)src->x;
for ( i = 0; *(_DWORD *)src->x > i; ++i )
*(_QWORD *)&dest->x[8 * i + 8] = sub_400840(*(Dummy **)&src->x[8 * i + 8]);
return dest;
}
```

It is calling itself recursively. Would it be a *deep clone*? Moreover, we can further see that the first four bytes should be `size`

and it would not be greater than 3. Otherwise `8*i+8 >= 32`

, overflowing the struct. After all, we think it is a node of the tree - and this is the final struct we have:

```
struct Node {
int size;
char x[4]; // unknown
Node *child[3];
};
```

More importantly, we can finally claim that `sub_400840`

is fully reversed.

```
Node *__fastcall clone(Node *src)
{
int i; // [rsp+14h] [rbp-Ch]
Node *dest; // [rsp+18h] [rbp-8h]
dest = (Node *)malloc(0x20uLL);
if ( !dest )
{
puts("Out of memory ;_;");
abort();
}
dest->size = src->size;
for ( i = 0; src->size > i; ++i )
dest->child[i] = clone(src->child[i]);
return dest;
}
```

Then we will be reversing `sub_400A59`

. This function is simple, it is defining a path with length $n$. After the tree is constructed, it will be used by `sub_400BED`

for enumeration. How? Let's see a baby example when $n = 2$:

Well... it is just enumerating all the ternary trees with depth $n$, where each of the leaf node is on level $n$.

After that, we check the number of *good* trees. By *good* it is defined by (former) `sub_4009C4`

:

```
signed __int64 __fastcall is_good_tree(Node *node, int k)
{
int ka; // [rsp+4h] [rbp-1Ch]
int kb; // [rsp+4h] [rbp-1Ch]
int i; // [rsp+1Ch] [rbp-4h]
ka = k;
if ( node->size > 1 && k )
return 0LL;
if ( node->size > k )
ka = node->size;
kb = ka - 1;
if ( kb < 0 )
kb = 0;
for ( i = 0; node->size > i; ++i )
{
if ( (unsigned __int8)is_good_tree(node->child[i], kb) != 1 )
return 0LL;
}
return 1LL;
}
```

harrier has implemented a *good* tree checker ~~(himself)~~ in Discord for me to test with:

After all, a tree is said to be *good* if both of the condition are satisfied:

- for each node with two children, every children should have at most one children, and
- for each node with three children, every children and their grandchildren should have at most one children.

Cool. We have the pieces of the puzzle gathered. Define $g(n)$ to be the number of good trees of depth $n$. We are going to find $g(1337)$ and move the sequence forward by $g(1337)$ instructions.

### Part III: Verifying this for the baby cases⌗

We are double checking if our observation checks out for smaller $n$'s. From above, $g(0) = 1$, $g(1) = 3\ \text{or}\ 4$, $g(2) = 5$ and $g(3) = 15\ \text{or}\ 16$.

For $n = 0$, the only tree would be:

Yeah. It is a good tree. Thus $g(0) = 1$. Also, $g(1) = 3$ since the three trees are all good:

For $n = 2$, we start rejecting trees. There are 39 trees, but there are only five being good. Hence $g(2) = 5$.

And $g(3) = 15$. They are:

Cool! Everything checks out! Luckily the memory overflows when $n = 4$, otherwise our OCD would be forcing us to find $g(4)$ and the writeup will be flooded by a large forest.

### Part IV: Algorithms, algorithms everywhere⌗

In this part, harrier and I work on a different topic:

- harrier's task: Find $g(1337)$.
- Mystiz's task: Find the flag if harrier has $g(1337)$ computed.

#### 4.1. What is $g(1337)$?⌗

To find the number of *good* trees of height $h$, we define an auxiliary variable $k$ (interpret it as *cooldown*). Here we redefine *good* trees again:

For a ternary tree, it is said to be *good* if for every node,

**[Recovery]**if $k > 0$: there should be at most one child, and**[Fertility]**if $k = 0$: if there are $c$ children, then each of them has cooldown $k = c-1$.

Simply put, you need to *recover* to be *fertile*. For instance, the following is a good tree since each of the node with non-zero cooldown has only a child.

However, this is not a *good* tree since the red node is too fertile:

Luckily, we don't need to count the *good* trees one by one. It can be solved by dynamic programming rather easily. Let's define the state `dp[h][k]`

to be the number of *good* trees if the tree is of depth `h`

and cooldown `k`

. Then:

**[Base case]**`dp[0][k] == 1`

for every`k`

,**[Transition]**`dp[h][k] == dp[h-1][k-1]`

for`h, k > 0`

, and**[Transition]**`dp[h][0] == dp[h-1][0] + dp[h-1][1]**2 + dp[h-1][2]**3`

for`h > 0`

.

The first condition is obvious - the only tree with depth 0 is the tree with the only root node. It is *good* no matter what the cooldown is.

For the second condition, it is not difficult to imagine it as an tree that must cooldown. Hence, if $\text{GT}_{hk}$ is a *good* tree with height $h>0$ and cooldown $k>0$, then:

It is a little bit tricky for the third condition. Since it is in the *fertile* mode again, we can decide if there are one, two or three children. Then each of the following trees work:

Hence, if there are `k`

children, then there are `dp[h-1][k-1]**k`

choices. Since we are able to pick `k = 1, 2, 3`

, we can simply sum them up for `dp[h][k]`

. After all, this is the Python script to compute:

```
def g(k: int) -> int:
# Base case
dp = [[1, 1, 1]]
# Transition
for _ in range(k):
dp.append([
sum([dp[-1][i]**(i+1) for i in range(3)]), dp[-1][0], dp[-1][1]
])
return dp[-1][0]
```

But the numbers is growing exponentially and finding $g(1337)$ takes eternally. What can we do? Nevermind, we will get back to this shortly.

#### 4.2. Fast sequence generation⌗

To compute the values of `ecx`

and `edx`

, we can find the number of executed instructions (denote it as $q$). Since the loop is completed every four steps, we can find $s_{\mathbf{floor}(q/4)}$ and $s_{\mathbf{floor}(q/4)+1}$ and perform the remainder of the instructions.

Let's get back to the sequence $s_n$:

- $s_0 = \text{82F96AC97429A68B}_{16}$,
- $s_1 = \text{32B9B6BCA55548ED}_{16}$, and
- $s_k = 3 s_{k-2} + 2 s_{k-1} + 4\mod2^{64}$ for $k \geq 2$.

To compute it efficiently, we can rewrite it as a matrix notation:

\[ \begin{bmatrix} s_{k+1} \\ s_k \\ 1 \end{bmatrix} = \begin{bmatrix} 2 & 3 & 4 \\ 1 & 0 & 0 \\ 0 & 0 & 1\end{bmatrix} \begin{bmatrix} s_k \\ s_{k-1} \\ 1 \end{bmatrix} \mod 2^{64}. \]

Why? Try the multiplication by yourselves! Anyway, then we are able to compute $s_m$ and $s_{m+1}$ efficiently, since

\[ \begin{bmatrix} s_{m+1} \\ s_m \\ 1 \end{bmatrix} = \begin{bmatrix} 2 & 3 & 4 \\ 1 & 0 & 0 \\ 0 & 0 & 1\end{bmatrix}^m \begin{bmatrix} s_1 \\ s_0 \\ 1 \end{bmatrix} \mod 2^{64}. \]

Moreover, since

\[\begin{bmatrix} 2 & 3 & 4 \\ 1 & 0 & 0 \\ 0 & 0 & 1\end{bmatrix}^{2^{64}} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{bmatrix}, \]

the square matrix has order $2^{64}$. That said, we don't need to find the exact value for $g(1337)$. Instead, we can compute $g(1337)\mod (2^{64}\times4)$. Hereby we need to multiple the number by four, since there are 4 instructions per loop).

### Part V: The flagggggggggg!⌗

After all, we can modify the Python function we had to compute $g(k)\mod 2^{66}$:

```
def g(k: int) -> int:
# Base case
dp = [[1, 1, 1]]
# Transition
for _ in range(k):
dp.append([
sum([dp[-1][i]**(i+1) for i in range(3)]) % 2**66, dp[-1][0], dp[-1][1]
])
return dp[-1][0]
```

And `g(1337) = 59988074356265869957`

pops out at no time. Therefore, we can find $s_{14997018589066467489}$ and $s_{14997018589066467490}$ and proceed one instruction further. We have `rdx`

being `0x62246322232ceabf`

and `rcx`

being `0xbf1d9826c054007`

!

Thus `p4{62246322232ceabfbf1d9826c054007}`

would be the flag, right? NO!

It was 4:40 am and we were very sleepy. It took us few minutes to figure out that `rcx`

isn't composed of 16 hexchars. The correct flag should be `p4{62246322232ceabf0bf1d9826c054007}`

(note that there is an *zero*). Yeah, 🏁 - and we are the first to capture this!