As the third part of the series, three reversing challenges will be included: The Hardest Path, A Junior Mathematician and Let's Chill.

最難行的路 / The Hardest Path (Reverse)

Challenge Summary

寧願不揀最易的路 行極還未到
寧願你最後未傾慕 但信念安好
在意的 不再是 愛的煩惱
是哪樣做人 更清高
餘生那段旅途
與哪類人共舞

When you think reverse engineering is hard, try working on reverse engineering challenges those need your algorithmic thinking skills!

nc HOST PORT

We are given two Python files excerpted below:

# lost.py

import os

mystery = 'NEWS'

def _29aa7a86665899ed(_050ca071ab51aece): raise Exception('😵‍💫💫🧱')

def _f42fb5e137443877(_a78810bb76cc7d70, *_ab1bbf35017f4f42):
    def _e6aea2db2242b19f(_41d28eb8c27952c3):
        if len(_41d28eb8c27952c3) == 0:
            if not _a78810bb76cc7d70: raise Exception('🤷🏁😕')
            return
        _03d38fa3a589db14, _41d28eb8c27952c3 = _41d28eb8c27952c3[0], _41d28eb8c27952c3[1:]
        return globals()[_ab1bbf35017f4f42[mystery.index(_03d38fa3a589db14)]](_41d28eb8c27952c3) if _03d38fa3a589db14 in mystery else _e6aea2db2242b19f(_41d28eb8c27952c3)

    return _e6aea2db2242b19f

_328518a497015157 = _f42fb5e137443877(False, "_ef5d07da6a407ff3", "_3f49b6a9053121fb", "_60de30732830eab8", "_3ff69bef8add0e90")
_5551904aa8bcdf8f = _29aa7a86665899ed
_3ff69bef8add0e90 = _f42fb5e137443877(False, "_328518a497015157", "_4403b7c481dc4765", "_be69bd3c24b2b668", "_45b7590f73dac4d3")
# ...40398 lines omitted (Lines 17 - 40417)...
# chall.py

import os
import base64
import hashlib


def work():
    return True
    
    challenge = os.urandom(8)
    print(f'🔧 {base64.b64encode(challenge).decode()}')
    response = base64.b64decode(input('🔩 '))
    h = hashlib.sha256(challenge + response).digest()
    return h.startswith(b'\x00\x00\x00')


def attempt(data):
    from lost import _328518a497015157

    _328518a497015157(data)
    flag = os.environ.get('FLAG', 'hkcert21{***REDACTED***}')
    print(flag)


if __name__ == '__main__':
    if work():
        data = input('🥺 ')
        attempt(data)
    else:
        print('😡')

The objective is to send a valid input to print the flag.

Solution

Understanding the script

lost.py is really bulky (3.4 MB) but repetitive. Before we look into lines 17 up to 40417, let's look at the two methods _29aa7a86665899ed and _f42fb5e137443877:

def _29aa7a86665899ed(_050ca071ab51aece): raise Exception('😵‍💫💫🧱')

The first method simply raise an exception. We should not call this if we want the flag.

def _f42fb5e137443877(_a78810bb76cc7d70, *_ab1bbf35017f4f42):
    def _e6aea2db2242b19f(_41d28eb8c27952c3):
        if len(_41d28eb8c27952c3) == 0:
            if not _a78810bb76cc7d70: raise Exception('🤷🏁😕')
            return
        _03d38fa3a589db14, _41d28eb8c27952c3 = _41d28eb8c27952c3[0], _41d28eb8c27952c3[1:]
        return globals()[_ab1bbf35017f4f42[mystery.index(_03d38fa3a589db14)]](_41d28eb8c27952c3) if _03d38fa3a589db14 in mystery else _e6aea2db2242b19f(_41d28eb8c27952c3)

    return _e6aea2db2242b19f

Looks unreadable. Let's rewrite the method:

def outer(done, *next_steps):
    def inner(payload):
        if len(payload) == 0:
            if not done: raise Exception('🤷🏁😕')
            return
        current, payload = payload[0], payload[1:]
        return globals()[next_steps['NEWS'.index(current)]](payload) if current in 'NEWS' else inner(payload)

    return inner

For the inner function, it reads the first character from payload. If current is either N, E, W or S, then it would call some function with the remaining characters from payload. Otherwise, it would call itself with the first character dropped.

What does the some function I meant above? It depends on outer. As an example, we will look at the function _328518a497015157, the first one we call with our input.

_328518a497015157 = outer(False, "_ef5d07da6a407ff3", "_3f49b6a9053121fb", "_60de30732830eab8", "_3ff69bef8add0e90")

The following are six example inputs. It would either call the second, third, fourth or fifth parameter depending on the first character of the input. In rare cases, it would call itself (if the input does not start with N, E, W nor S) or even raise an exception (if the input is empty).

_328518a497015157("NEVERLAND") # _ef5d07da6a407ff3("EVERLAND") 👈 Starts with N
_328518a497015157("ELSEWHERE") # _3f49b6a9053121fb("LSEWHERE") 👈 Starts with E
_328518a497015157("WATERFALL") # _60de30732830eab8("ATERFALL") 👈 Starts with W
_328518a497015157("SPECTATOR") # _3ff69bef8add0e90("PECTATOR") 👈 Starts with S
_328518a497015157("OTHERWISE") # _328518a497015157("THERWISE") 👈 Does not start with N, E, W nor S
_328518a497015157("")          # Exception: 🤷🏁😕

Notably, there is exactly one outer call with the first parameter being True:

_8b0eb6f195ae182a = outer(True, "_92ccf583b1b065ea", "_d3084505a4a12123", "_e335a503c47e5243", "_ebff1548ca6e8dbd")

Let's use the same set of example inputs - there is a subtle difference. When the input is empty, it no longer raises an exception.

_8b0eb6f195ae182a("NEVERLAND") # _92ccf583b1b065ea("EVERLAND") 👈 Starts with N
_8b0eb6f195ae182a("ELSEWHERE") # _d3084505a4a12123("LSEWHERE") 👈 Starts with E
_8b0eb6f195ae182a("WATERFALL") # _e335a503c47e5243("ATERFALL") 👈 Starts with W
_8b0eb6f195ae182a("SPECTATOR") # _ebff1548ca6e8dbd("PECTATOR") 👈 Starts with S
_8b0eb6f195ae182a("OTHERWISE") # _8b0eb6f195ae182a("THERWISE") 👈 Does not start with N, E, W nor S
_8b0eb6f195ae182a("")          # return nothing

Since we do not want to raise errors, we would like to find an input such that it would lead to a chain of function calls that finishes with _8b0eb6f195ae182a("").

_328518a497015157("INPUT...")
# that calls _XXXXXXXXXXXXXXXX("NPUT...")
#  that calls _XXXXXXXXXXXXXXXX("PUT...")
#   ...
#    that calls _XXXXXXXXXXXXXXXX(".")
#     that calls _8b0eb6f195ae182a("")
# does not throw error!

Visualizing the function calls

At this stage, maybe we could draw a directed graph mentioning how the function calls are triggered. For example, we will use the following subgraph to describe the functions _328518a497015157 can reach:

digraph { graph [bgcolor="transparent"] node [color="#ffe4e1", fontcolor="#ffe4e1", fillcolor="#33333c", style="filled"] edge [color="#ffe4e1", fontcolor="#ffe4e1"] node[shape="box", style="rounded"] _328518a497015157->_ef5d07da6a407ff3[label="N"] _328518a497015157->_3f49b6a9053121fb[label="E"] _328518a497015157->_60de30732830eab8[label="W"] _328518a497015157->_3ff69bef8add0e90[label="S"] _328518a497015157->_328518a497015157[style=dashed] }

We can see that _ef5d07da6a407ff3 raises the 😵‍💫💫🧱 exception. Let's mark it with a dashed border because it would lead us to nowhere.

digraph { graph [bgcolor="transparent"] node [color="#ffe4e1", fontcolor="#ffe4e1", fillcolor="#33333c", style="filled"] edge [color="#ffe4e1", fontcolor="#ffe4e1"] node[shape="box", style="dashed,rounded"] _ef5d07da6a407ff3 }

Since the graph could go very large (there are 40401 vertices), I will only include part of the graph below:

digraph { graph [bgcolor="transparent"] node [color="#ffe4e1", fontcolor="#ffe4e1", fillcolor="#33333c", style="filled"] edge [color="#ffe4e1", fontcolor="#ffe4e1"] node[shape="box", style="rounded"] _328518a497015157->_ef5d07da6a407ff3[label="N"] _328518a497015157->_3f49b6a9053121fb[label="E"] _328518a497015157->_3ff69bef8add0e90[label="S"] _328518a497015157->_60de30732830eab8[label="W"] _328518a497015157->_328518a497015157[style=dashed] _3ff69bef8add0e90->_328518a497015157[label="N"] _3ff69bef8add0e90->_4403b7c481dc4765[label="E"] _3ff69bef8add0e90->_45b7590f73dac4d3[label="S"] _3ff69bef8add0e90->_be69bd3c24b2b668[label="W"] _3ff69bef8add0e90->_3ff69bef8add0e90[style=dashed] _45b7590f73dac4d3->_3ff69bef8add0e90[label="N"] _45b7590f73dac4d3->_e18a56066b9fad85[label="E"] _45b7590f73dac4d3->_fd5aa08fd7e3bbf8[label="S"] _45b7590f73dac4d3->_7a9d4be175a7d351[label="W"] _45b7590f73dac4d3->_45b7590f73dac4d3[style=dashed] _fd5aa08fd7e3bbf8->_45b7590f73dac4d3[label="N"] _fd5aa08fd7e3bbf8->_cdc4ac909cc2968d[label="E"] _fd5aa08fd7e3bbf8->_7dbe20d51cdf3195[label="S"] _fd5aa08fd7e3bbf8->_65f5c5d291c9d575[label="W"] _fd5aa08fd7e3bbf8->_fd5aa08fd7e3bbf8[style=dashed] _7dbe20d51cdf3195->_fd5aa08fd7e3bbf8[label="N"] _7dbe20d51cdf3195->_805c0aea921e78c0[label="E"] _7dbe20d51cdf3195->_f2a00ce6abf90e26[label="W"] _7dbe20d51cdf3195->_69b6af99d9ec1424[label="S"] _7dbe20d51cdf3195->_7dbe20d51cdf3195[style=dashed] _805c0aea921e78c0->_cdc4ac909cc2968d[label="N"] _805c0aea921e78c0->_8792d51c0c90d618[label="E"] _805c0aea921e78c0->_7dbe20d51cdf3195[label="W"] _805c0aea921e78c0->_0f3b311bbfff72dd[label="S"] _805c0aea921e78c0->_805c0aea921e78c0[style=dashed] _8792d51c0c90d618->_d1211b2b3302974c[label="E"] _8792d51c0c90d618->_01c8e59f068ec3f3[label="N"] _8792d51c0c90d618->_805c0aea921e78c0[label="W"] _8792d51c0c90d618->_db538ba7c5acf45d[label="S"] _01c8e59f068ec3f3->more1 more1[label="...", shape="plaintext"] _328518a497015157[label="_328518...",style="filled,rounded"] _ef5d07da6a407ff3[label="_ef5d07...",style="dashed,rounded"] _3f49b6a9053121fb[label="_3f49b6...",style="dashed,rounded"] _3ff69bef8add0e90[label="_3ff69b..."] _60de30732830eab8[label="_60de30...",style="dashed,rounded"] _4403b7c481dc4765[label="_4403b7...",style="dashed,rounded"] _be69bd3c24b2b668[label="_be69bd...",style="dashed,rounded"] _e18a56066b9fad85[label="_e18a56...",style="dashed,rounded"] _7a9d4be175a7d351[label="_7a9d4b...",style="dashed,rounded"] _cdc4ac909cc2968d[label="_cdc4ac...",style="dashed,rounded"] _65f5c5d291c9d575[label="_65f5c5...",style="dashed,rounded"] _f2a00ce6abf90e26[label="_f2a00c...",style="dashed,rounded"] _69b6af99d9ec1424[label="_69b6af...",style="dashed,rounded"] _0f3b311bbfff72dd[label="_0f3b31...",style="dashed,rounded"] _45b7590f73dac4d3[label="_45b759..."] _fd5aa08fd7e3bbf8[label="_fd5aa0..."] _7dbe20d51cdf3195[label="_7dbe20..."] _805c0aea921e78c0[label="_805c0a..."] _8792d51c0c90d618[label="_8792d5..."] _01c8e59f068ec3f3[label="_01c8e5..."] _d1211b2b3302974c[label="_d1211b...",style="dashed,rounded"] _db538ba7c5acf45d[label="_db538b...",style="dashed,rounded"] }

Up to now, some may know that the nodes may form a two-dimensional grid so far. For example, walking N followed by S would always return to itself (unless you hit the deadend). Of course, I am the challenge author so I know it is a 2D grid. You can verify it by drawing the grid too. Below is the grid I made. Note that the red square on the top-left is the function _328518a497015157 (the starting function), and the red square on the bottom-right is the function _8b0eb6f195ae182a (the finishing function). After all, it is a maze. Amazing!

Dodging a built-in exception

It seems that we are required to find an arbitrary path between two ends. However, for most of the path it will still print no good!. Why? Let's patch chall.py to see what error it actually shows:

- try:
-     _328518a497015157(data)
-     flag = os.environ.get('FLAG', 'hkcert21{***REDACTED***}')
-     print(flag)
- except:
-     print('no good!')
+ _328518a497015157(data)
+ flag = os.environ.get('FLAG', 'hkcert21{***REDACTED***}')
+ print(flag)

Sending an arbitrary (incorrect) path that traverses from the beginning to the end, we obtain the below error:

RecursionError: maximum recursion depth exceeded while calling a Python object

Hmm, stack overflow. That means that we need to find a short path from the ends. We can use breadth-first search (BFS) to find a shortest path. We actually can craft an input that does not raise an exception when passed to _328518a497015157. Below is a corresponding input:

SSSSEENNEESSSSEENNEEEEEEEESSSSEESSSSWWSSSSEESSEESSSSWWSSSSWWWWWWSSWWSSSSSSEEEESSSSEESSEESSSSSSSSEENNEESSEEEESSEESSSSEESSEEEEEEEESSEEEESSSSEEEEEENNNNNNEENNWWNNEEEENNNNEEEENNEEEEEESSSSWWSSWWSSSSEEEESSWWSSEESSEEEESSSSEENNNNNNEEEESSEESSEESSEEEESSEENNEENNWWNNEEEEEESSEEEEEENNNNWWWWNNWWNNNNWWNNEEEEEENNEENNWWWWNNWWNNEENNEEEEEEEESSEESSEESSSSEESSEEEEEENNEESSSSSSEEEEEENNNNNNNNNNNNNNNNEEEEEEEEEEEESSEESSSSWWWWSSSSSSWWSSEESSEEEEEENNNNEESSEESSEEEENNEENNNNEEEEEENNEENNWWWWWWNNNNWWSSSSWWSSSSWWWWNNNNNNNNNNEENNEENNEENNEEEESSEESSSSEEEEEESSEEEEEEEENNNNEENNNNWWNNEEEENNNNEEEEEESSSSSSSSSSSSSSSSSSSSSSEESSSSEEEEEESSEESSWWSSSSEESSSSEEEENNEEEESSWWSSSSSSSSWWSSWWWWWWSSWWWWSSWWNNNNWWWWWWSSEESSWWSSWWSSEESSSSSSSSSSWWSSEEEESSSSSSEEEESSSSEESSEESSSSEEEESSEENNEEEESSSSWWSSEESSWWSSSSWWWWNNWWSSSSEESSEEEESSWWSSSSWWWWNNWWNNWWSSSSWWWWWWWWWWWWWWSSSSSSWWWWSSEESSWWSSWWWWWWSSWWWWWWWWSSSSEESSEEEEEESSSSSSEEEESSSSEEEEEESSWWWWSSSSSSSSSSSSEEEEEESSEEEENNEEEENNEEEENNEEEENNEEEESSEENNEEEESSSSSSSSWWSSWWSSSSWWSSSSSSEESSSSEEEESSEENNEESSEEEE

I have carefully generated the maze to accept only the shortest paths. Even those paths those are slightly longer will be rejected.

Credit: Ja5on in team AVADA KEDAVRA
There were two players asking me to increase the recursion limit. Of course I did not update that because this is part of the challenge.

中二病人 / A Junior Mathematician (Reverse)

Challenge Summary

平凡人笑吧 望着你於地上爬
靈魂和熱血 如砂礫風化
你嘲笑的 是那時你都一樣
離奇神化
這自由密碼 卻伴隨成長退化

Do you know that you can write programs with numbers? Does it mean that everyone is born to be programmers?

./chall.py prog.frac
hkcert21{***REDACTED***}
(:

We are given a FRACTRAN interpreter implemented in Python chall.py and a FRACTRAN program prog.frac. The initial value is set to $2 \cdot 3^{m_1} \cdot 5^{m_2} \cdot ...$ where $m_1m_2...$ is the flag. Also, $\text{chr}(k\ \text{mod}\ 128)$ will be printed whenever the state is $2^{k}$.

It is known that the program will be printing (: if given the correct flag. The goal is, of course, to find the flag.

Motivation

When I was in the secondary school, I learned about the existence of FRACTRAN on Project Euler1 and was very excited about it.

Solution

We can let $n$ be the state and its prime factorization being $n = {p_1}^{k_1} {p_2}^{k_2} ... {p_m}^{k_m}$. Suppose that we have a fraction $u/v$ with

\[\begin{cases} u = {p_1}^{s_1} {p_2}^{s_2} ... {p_m}^{s_m} \\ v = {p_1}^{t_1} {p_2}^{t_2} ... {p_m}^{t_m} \end{cases}.\]

Then it is equivalent to the below lines of pseudo-Python code:

# "if u/v * n is an integer"
if k_1 >= s_1 and k_2 >= s_2 and ... and k_m >= s_m:
    # update "n = u/v * n"
    k_1 -= s_1
    k_2 -= s_2
    ...
    k_m -= s_m

    k_1 += t_1
    k_2 += t_2
    ...
    k_m += t_m

With that idea in mind, I wrote a program in Sagemath that could convert FRACTRAN programs into Python scripts:

primes = [2]
# If p is the n-th prime, return n
def get_n(p):
    while primes[-1] < p:
        primes.append(next_prime(primes[-1]))
    return primes.index(p)

with open('prog.frac') as f:
    fractions = f.read().strip().split('\n')
fractions = [tuple(map(int, fraction.split('/'))) for fraction in fractions]

print('while True:')

for u, v in fractions:
    condition = []
    action    = []
    for p, k in factor(v):
        condition.append((get_n(p), k))
        action.append((get_n(p), -k))
    for p, k in factor(u):
        action.append((get_n(p), k))

    action_string = '; '.join([f'p_{p} += {k}' for p, k in action]) + '; continue'

    if len(condition) == 0:
        print(f'  if True: {action_string}')
    else:
        condition_string = ' and '.join([f'p_{p} >= {k}' for p, k in condition])
        print(f'  if {condition_string}: {action_string}')

print(f'  break')

I also added extra lines to the generated code to align with the functionalities defined in chall.py.

+ p = [0 for _ in range(2000)]
+ 
+ flag = b'...'
+ p[1] = 1
+ for i, c in enumerate(flag):
+   p[i+2] = c
+ 
  while True:
+   if sum(p) == p[0]: print(chr(p[0] & 0x7f), end='')
    if p[2] >= 2: p[2] += -2; p[102] += 1; continue
    if p[3] >= 2: p[3] += -2; p[103] += 1; continue
    if p[4] >= 2: p[4] += -2; p[104] += 1; continue
    if p[5] >= 2: p[5] += -2; p[105] += 1; continue
    if p[6] >= 2: p[6] += -2; p[106] += 1; continue
    ...

The full generated code is available here. Even though we cannot fully understand what the .frac program does for now, it runs faster because we do not need to compute large numbers anymore.

From the generated code, we can see that there are a number of segments:

  • [L10-L709] Two units of $p_{k}$ is consumed for one unit of $p_{k+100}$ ($2 \leq k \leq 701$).
  • [L710] $p_1$ will be converted to $p_{802}$.
  • [L711-L1510] One unit of $p_i$ (and sometimes one unit of $p_k$) is consumed for one unit of $p_{i+1}$ ($2 \leq k \leq 801$ and $802 \leq i \leq 1601$).
  • [L1511-L2310] One unit of $p_i$ is consumed for one unit of $p_{1603}$ ($802 \leq i \leq 1601)$.
  • [L2311-L5510] One unit of $p_i$ and $p_{1602}$ is consumed for one unit of $p_{1603}$, and the remaining $p_i$'s will be disposed ($2 \leq i \leq 1601$).
  • [L5511-L5517] Prints (: if $p_{1602} = 1$, and ): if $p_{1603} = 1$.

In this program when a new segment is reached, the previous segments would not be called anymore. This is because the constraints those satisfies the former segments will not be regenerated.

digraph { graph [bgcolor="transparent"] node [color="#ffe4e1", fontcolor="#ffe4e1", fillcolor="#33333c", style="filled"] edge [color="#ffe4e1", fontcolor="#ffe4e1"] rankdir=LR node[shape=&#34;box&#34;, style=&#34;rounded&#34;] &#34;L10-L709&#34; -&gt; &#34;L710&#34; -&gt; &#34;L711-L1510&#34; &#34;L711-L1510&#34; -&gt; &#34;L1511-L2310&#34; [constraint=false] &#34;L1511-L2310&#34; -&gt; &#34;L2311-L5510&#34; -&gt; &#34;L5511-L5517&#34; }

We understand that the goal is to get $p_{1602} = 1$ to hit the last segment of code, which we then can print (: as the result. To achieve this, we need to generate a unit of $p_{1602}$ during lines 711 to 1510 (this is the only place that makes $p_{1602}$). Also, no actions should be taken during lines 2311 to 5510. Therefore we need to correctly generate $p_2, p_3, ..., p_{801}$ in lines 10 to 709.

Looking closely to the first segment (lines 10 to 709) of the program. To reiterate, every two units of $p_{k}$ is consumed for one unit of $p_{k+100}$. Also, based on the input, we know that $0 \leq p_2, p_3, ..., p_{101} < 256$. If $p'_2, p'_3, ..., p'_{801}$ are the final values after the segment is executed, then

\[p_k = p'_k + 2 \cdot p'_{k+100} + 2^2 \cdot p'_{k+200} + ... + 2^7 \cdot p'_{k+700}\]

for every $k = 2, 3, ..., 101$, and $0 \leq p'_2, p'_3, ..., p'_{801} \leq 1$. That said $\overline{p'_{k+700}p'_{k+600}...p'_k}$ is the binary representation of $p_k$.

We then can look at lines 711 to 1510 in order to identify which $p'_k$'s should be set to 1. In the first ten lines, we knew that $p'_{262}, p'_{449}, p'_{316}$ and $p'_{616}$ should be all 1s.

if p[802] >= 1: p[802] += -1; p[803] += 1; continue
if p[262] >= 1 and p[803] >= 1: p[262] += -1; p[803] += -1; p[804] += 1; continue
if p[449] >= 1 and p[804] >= 1: p[449] += -1; p[804] += -1; p[805] += 1; continue
if p[805] >= 1: p[805] += -1; p[806] += 1; continue
if p[806] >= 1: p[806] += -1; p[807] += 1; continue
if p[807] >= 1: p[807] += -1; p[808] += 1; continue
if p[808] >= 1: p[808] += -1; p[809] += 1; continue
if p[316] >= 1 and p[809] >= 1: p[316] += -1; p[809] += -1; p[810] += 1; continue
if p[810] >= 1: p[810] += -1; p[811] += 1; continue
if p[616] >= 1 and p[811] >= 1: p[616] += -1; p[811] += -1; p[812] += 1; continue
...

Although there is a list of variables those should be 1s, it does not mean that the remaining variables are zero. This is yet enforced in lines 2311 to 5310. After all, we can solely recover $p'_2, p'_3, ..., p'_{801}$ which prints (: from lines 711 to 1510. We then can recover $p_2, p_3, ..., p_{101}$ (the flag) from $p'_2, p'_3, ..., p'_{801}$, its binary representation.

hkcert21{c0nw4y_1nv3n7ed_th1s_pr0gr4mm1n9_l4ngu4g3_w1th_m4th}

今天我不想做嘢 / Let's Chill (Reverse)

Challenge Summary

今天我不想做嘢 只想懶
可不可輕鬆陣呢 嘆一嘆
讓時間 慢慢變慢

I knew you just want to spend your time chilling instead of spending time CTFs. Let's visit the page, have a cup of coffee and wait for the flag.

We are given a HTML file with a large proportion of JavaScript code. It is executed so that the player could eventually wait for the flag.

Solution

Maybe we should wait...?

The challenge description said that we can get flag by simply opening the webpage and wait. I suppose that is the intended solution for a 450 point challenge, right?

Most of the Hongkongers do not believe that waiting is meaningful. Credit: Stand News

Beautifying the code

Without doubt, the JavaScript code is the most important thing in the HTML file. We can use online tools to prettify the code. For example, we can use Online JavaScript Beautifier. This is my attempt to further prettify the code after using the tool.

Phase 1: Replace the function calls relating to _0x1f23

This involves few steps:

  • Replace _0x363e23(0x1c3) to "hkcert21{this_is_definitely_not_the_flag!}" and _0x363e23(0x1cd) to "split" in the scope of _0x363e23 = _0x1f23.
  • Replace -parseInt(_0x435581(0x1c7)) / 0x1 * ... + -parseInt(_0x435581(0x1bd)) / 0xa to 176668 in the scope of _0x435581 = _0x1f23.
  • Replace _0x2bf188(0x1cb) to "fromCharCode", _0x2bf188(0x1be) to "length", _0x2bf188(0x1ca) to "getElementById", _0x2bf188(0x1c8) to "abs" and so on in the scope of _0x2bf188 = _0x1f23.

The below code snippets explains how the expressions are replaced. They are executed on the developers' console of an browser when the given index.html is opened.

_0x363e23(0x1c3)
// "hkcert21{this_is_definitely_not_the_flag!}"

_0x363e23(0x1cd)
// "split"

_0x435581 = _0x1f23; -parseInt(_0x435581(0x1c7)) / 0x1 * (parseInt(_0x435581(0x1bc)) / 0x2) + parseInt(_0x435581(0x1c5)) / 0x3 + -parseInt(_0x435581(0x1c4)) / 0x4 * (-parseInt(_0x435581(0x1c0)) / 0x5) + parseInt(_0x435581(0x1c9)) / 0x6 * (parseInt(_0x435581(0x1c6)) / 0x7) + -parseInt(_0x435581(0x1c2)) / 0x8 + -parseInt(_0x435581(0x1cc)) / 0x9 + -parseInt(_0x435581(0x1bd)) / 0xa
// 176668

_0x2bf188 = _0x1f23; _0x2bf188(0x1cb)
// "fromCharCode"

// ...

The below code is an intermediate result after the first phase of prettifying:

(function(_0x1294fd, _0x22b230) {
    const _0x123741 = _0x1294fd();
    while (!![]) {
        try {
            const _0x40545f = 176668;
            if (_0x40545f === _0x22b230) break;
            else _0x123741['push'](_0x123741['shift']());
        } catch (_0x4a0568) {
            _0x123741['push'](_0x123741['shift']());
        }
    }
}(_0x3592, 0x2b21c));
let flag = "hkcert21{this_is_definitely_not_the_flag!}"["split"]('');

function _0x3592() {
    const _0xdbd692 = ['11166WHNYgN', 'getElementById', 'fromCharCode', '553698JoiHCo', 'split', '2SOiWzA', '1323560vDNHlU', 'length', 'reduce', '757205AqJtbg', 'join', '2460768ecYZOk', 'hkcert21{this_is_definitely_not_the_flag!}', '8wdMNpV', '990753HsBKBA', '399soVHoe', '61068BicfOM', 'abs'];
    _0x3592 = function() {
        return _0xdbd692;
    };
    return _0x3592();
}

function exec() {
    const _0x291954 = [
        [
            [-0x1a95b158398, 0x163d12e52747, -0x490a9659fc2, 0x16b6ddb08c44, 0x1f9573043e67], 0x8f9e9388f2b
        ],
       // ...
    ];
    let _0x5e98ba = 0x0,
        _0x589c1b = 0x1,
        _0xd9ed48 = Infinity,
        _0x3b7a05 = undefined;
    setTimeout(() => setInterval(_0x3f26fe, 0xa), 0x1f4);

    function _0x3f26fe() {
        const _0x400608 = _0x291954[_0x5e98ba][0x0]['length'];
        if (_0x589c1b == 0x1 << _0x400608) {
            _0x5e98ba < _0x291954['length'] && (_0x5e98ba += 0x1, _0x589c1b = 0x0, _0xd9ed48 = Infinity);
            _0x5e98ba === _0x291954["length"] && clearInterval(_0x3b7a05);
            return;
        }
        const _0x581f7b = _0x291954[_0x5e98ba][0x0]['filter']((_0x18f251, _0x1089a2) => 0x1 << _0x1089a2 & _0x589c1b)["reduce"]((_0x36542b, _0x391261) => _0x36542b + _0x391261, 0x0);
        if (Math["abs"](_0x581f7b - _0x291954[_0x5e98ba][0x1]) < Math["abs"](_0xd9ed48 - _0x291954[_0x5e98ba][0x1])) _0xd9ed48 = _0x581f7b;
        _0x589c1b += 0x1, flag[_0x5e98ba] = String["fromCharCode"](0x21 + (_0xd9ed48 % 0x5e + 0x5e) % 0x5e);
        for (let _0x26d021 = _0x5e98ba + 0x1; _0x26d021 < flag["length"]; _0x26d021++) {
            flag[_0x26d021] = String["fromCharCode"](0x21 + (_0x26d021 + _0x589c1b) % 0x5e);
        }
        document["getElementById"]('flag')['innerHTML'] = flag["join"]('');
    }
}
Phase 2: Understand the anonymous function
(function(_0x1294fd, _0x22b230) {
    const _0x123741 = _0x1294fd();
    while (!![]) {
        try {
            const _0x40545f = 176668;
            if (_0x40545f === _0x22b230) break; // 👈
            else _0x123741['push'](_0x123741['shift']());
        } catch (_0x4a0568) {
            _0x123741['push'](_0x123741['shift']());
        }
    }
}(_0x3592, 0x2b21c));

Since 0x2b21c === 176668, the function actually ends up at the highlighted line. That said the function is actually meaningless and we can safely remove the whole part. As _0x3592 is not called anywhere else, it is considered as a dead code and can be deleted.

Part 3: Pretty the exec function

In this phase, we can do the following things to improve the code readability (some of them are subjective):

  • Rename the variables like _0x291954 to values, _0x5e98ba to i.
  • Replace ["length"] to .length, ["filter"] to .filter and so on.
  • Convert the numbers from base 16 to base 10 (except those in values, there are too much numbers to be converted).
  • Format the code.

After all, this is the final prettified code:

let flag = "hkcert21{this_is_definitely_not_the_flag!}".split('');

function exec() {
    const values = [
        [
            [-0x1a95b158398, 0x163d12e52747, -0x490a9659fc2, 0x16b6ddb08c44, 0x1f9573043e67], 0x8f9e9388f2b
        ],
        // ...
    ];
    let i = 0,
        mask = 1,
        answer = Infinity,
        interval = undefined;
    setTimeout(() => setInterval(_0x3f26fe, 10), 500);

    function _0x3f26fe() {
        const n = values[i][0].length;
        if (mask == 1 << n) {
            if (i < values.length) {
                i += 1;
                mask = 0;
                answer = Infinity;
            }
            if (i === values.length) {
                clearInterval(interval);
            }
            return;
        }
        const sum = values[i][0].filter((_, k) => 1 << k & mask).reduce((a, b) => a + b, 0);
        if (Math.abs(sum - values[i][1]) < Math.abs(answer - values[i][1])) {
            answer = sum;
        }
        mask += 1;
        flag[i] = String.fromCharCode(33 + (answer % 94 + 94) % 94);
        for (let i2 = i + 1; i2 < flag.length; i2++) {
            flag[i2] = String.fromCharCode(33 + (i2 + mask) % 94);
        }
        document.getElementById('flag').innerHTML = flag.join('');
    }
}

We can see from _0x3f26fe the underlying algorithm challenge we gotta deal with.

Solving an algorithmic puzzle

We can see that the program is actually solving a problem in a slow way. Let's write down the challenge statement and discuss an algorithm solving it:

Given an array $a_1, a_2, ..., a_n$ and a target $b$, find $x_1, x_2, ..., x_n \in \{0, 1\}$ such that

\[\left|a_1 x_1 + a_2 x_2 + ... + a_n x_n - b\right|\qquad[\dagger]\]

is minimized. Moreover, $5 \leq n \leq 46$, $-2^{45} \leq a_1, a_2, ..., a_n \leq 2^{45}$ and $-2^{50} \leq b \leq 2^{50}$.

Intuitive meaning. We need to pick some numbers from $a_1, a_2, …, a_n$ such that the sum is closest to $b$.

It is a standard meet-in-the-middle challenge. We can exhaust half of the $x_i$'s each time to attain an time complexity $\mathcal{O}(2^{n/2})$. This is the outline of the algorithm (let $m = \lfloor n/2 \rfloor$):

  1. Exhaust $x_1, ..., x_m$ and let $L \leftarrow \{a_1 x_1 + ... + a_m x_m: x_1, ..., x_m \in \{0, 1\} \}$.
  2. Exhaust $x_{m+1}, ..., x_n$ and let $R \leftarrow \{-a_{m+1} x_{m+1} - ... - a_n x_n + b: x_{m+1}, ..., x_n \in \{0, 1\} \}$.
  3. Sort $L$ and $R$ in ascending order.
  4. Let $i \leftarrow 0$, $j \leftarrow 0$ and $u \leftarrow \infty$.
  5. While $i+1 < 2^m$ and $L_{i+1} < R_j$, set $i \leftarrow i + 1$.
  6. If $\left|L_i - R_j\right| < \left|u - b\right|$ set $u \leftarrow L_i - R_j + b$.
  7. If $i+1 < 2^m$ and $\left|L_{i+1} - R_j\right| < \left|u - b\right|$ set $u \leftarrow L_{i+1} - R_j + b$.
  8. Set $j \leftarrow j + 1$. If $j < 2^{n-m}$ go back to step 5.
  9. Return $u$.

What does the algorithm do? In short, we are rewriting the expression $[\dagger]$ into two quantities:

$$a_1 x_1 + a_2 x_2 + … + a_m x_m, \qquad - a_{m+1} x_{m+1} - a_{m+2} x_{m+2} + … - a_n x_n + b.$$

The goal is to find $x_1, x_2, …, x_m$ (for the left) and $x_{m+1}, x_{m+2}, …, x_n$ (for the right) such that the difference between the quantities is the smallest.

Having the algorithm implemented and running it, we have the flag:

hkcert21{l3t5_ch1ll_4nd_w41t_f0r_th3_f14g}

Ironically, you will not be able to obtain the flag if you wait for the original algorithm to finish.

Solution script

from typing import List
from tqdm import tqdm

candidates = [...] # plug that in yourselves!

def get_subarray_sums(nums, base=0):
    if len(nums) == 0:
        yield base
        return

    for k in get_subarray_sums(nums[1:], base):
        yield k
        yield k + nums[0]

def min_abs_distance(a: List[int], b: int) -> int:
    n = len(nums)
    m = n//2

    lhs = sorted(get_subarray_sums([+k for k in a[:m]]))
    rhs = sorted(get_subarray_sums([-k for k in a[m:]], b))
            
    i, j, u = 0, 0, 10**100
    # We assumed that 10**100 is infinity.
    
    for j in range(2**(n-m)):
        while i+1 < 2**m and lhs[i+1] <= rhs[j]:
            i += 1
        
        if                abs(lhs[i  ] - rhs[j]) < abs(u - b): u = lhs[i  ] - rhs[j] + b
        if i+1 < 2**m and abs(lhs[i+1] - rhs[j]) < abs(u - b): u = lhs[i+1] - rhs[j] + b

    return u

flag = b''
for nums, goal in tqdm(candidates):
    flag += bytes([0x21 + min_abs_distance(nums, goal) % 0x5e])

print(flag)

  1. Project Euler (2010). "An amazing Prime-generating Automaton"
    https://projecteuler.net/problem=308