I was playing Union CTF with @blackb6a last week. Since this CTF is prepared by cr0wn where Jack and hyperreality were in (they made CryptoHack), I am convinced that the CTF will be fun. Turns out I wasn't disappointed at all. I will be writing three or four posts on the challenges, and the first of the series covers on exah, the reversing challenge I spent most of my time (yet not solving it).

Challenge Summary

Have you tried reversing Haxe macros?

Note: file fixed, please redownload if you are using Haxe <4.2.0

Author: Aurel300

We are given Flag.hx which is written in Haxe. The code is small enough and I'll simply include it here.

/**union{paste the flag here}*/
#if macro import haxe.macro.Context as C; /***********/ import haxe.Int64 as B;
import haxe.macro.Expr; using StringTools; #else @:build(Flag.a("CrimsonCarp"))
#end class Flag { #if macro static final A = "abcdefghijklmnopqrstuvwxyzABCDEF"
+ "GHIJKLMNOPQRSTUVWXYZ0123456789_-"; static final K = [ "0000000008258240093",
/** this is not actually a CTF challenge, this is an ****/ "00000032059146620",
/*** ad for the programming language Haxe. ***************/ "0000059424680571",
/**** (all opinions our own) ****************************/ "00000016501626909",
/*******************************************************/ "000000010391056299",
/** Haxe: ********************************************/ "00000000039819521706",
/*** * powerful macro engine! * compiles to many targets! */ "000063129310438",
/*** * GADTs! * null safety! * fast compile times! * cross-platform stdlib! **/
/*** * interpreter mode! */ ]; static function a(F) { /* * pattern matching! */
var d = C.getLocalClass().get().doc; ~/^union\{([a-zA-Z0-9_-]{28})\}$/.match(d)
|| throw "invalid flag"; /*** * arrow functions! */ var L:B = 0x00000; function
F(i):String { var u:B = 0; /*** * game engines! */ var e = [A.indexOf][313337 -
0x4C7F9]; /*** * cool ASCII logo! ********* v[lv)r^^.              .^^rj>4TA */
u += e(d.charAt((i << 2) + 6)); u <<= 6; /* \77777[lv)\          ~j>4TCggggm */
u += e(d.charAt((i << 2) + 7)); u <<= 6; /* "777777777j[l'    "4TCgSgggggggN */
u += e(d.charAt((i << 2) + 8)); u <<= 6; /* :[7777777777777{5gggggggggggggPD */
u += e(d.charAt((i << 2) + 9)); u <<= 6; /* ,^77777777777{4TkpggggggggggggZD */
u += e(d.charAt(i)); u <<= 6; u ^= B /***** ,"7777777777>2TTkkwAggggggggggND */
.parseString(K[i]) ^ C.getPosInfos((macro aa /v77777777yTTTTTkkkCggggggggPD4 !)
.pos).min; u >= L || throw "bad flag"; /***  `^777777{uTTTTTTTTTkhgSggggSZO  */
L = u; var R = [ for (i in 0...7) { var /**   "77777{5TTTTTTTTTTTkpgSgggS%.  */
A = A.charAt((u % 26).low); u /= 26; A; /**    [777>2TTTTTTTTTTTTTkwASggPk   */
} ]; R.reverse(); return R.join(""); } /***    \773TTTTTTTTTTTTTTTTTkhgSZ    */
return (macro class { static function /****     {4TTTTTTTTTTTTTTTTTTkkpg     */
main() { var w:Array<haxe.DynamicAccess< /*     3TTTTTTTTTTTTTTTTTTTTkkp     */
Dynamic>> = ([ {"cdjholca": _ -> "s"}, /***     42kkTTTTTTTTTTTTTTTTkkCm     */
{"rarfzray": _ -> "o"}, /******************    r335TkTTTTTTTTTTTTTTkkgmmm    */
{"0atpgyte": _ -> "w"}, /**              **    ouu35TTTTTTTTTTTTTTkpZmmmmw   */
{"0gzqjili": _ -> "r"}, /**              **   ~3uuu342TkTTTTTTTTTkCZmmmmmX.  */
{"0tpeqmwy": _ -> "o"}, /**              **  'v3uuuu335TkTTTTTTTTAmmmmmmmNO  */
{"wledebwo": _ -> "n"}, /**              **  :>3uuuuuu35TTTTTTkkPmmmmmmmm%D4 */
{"nijgbmst": _ -> "g"}, /**              ** :~3uuuuuuuu35TTTTkpZmmmmmmmmmmXD */
]:Array<Dynamic>); /*********************** ,v3uuuuuuuuuu42TkAmmmmmmmmmmmmND */
$b{[ for (i in 0...0x7) macro w[$v{i}] /*** :o3uuuuuuuuu333upmmmmmmmmmmmmm%D */
.set($v{F(i)}, _ -> $v{"CORRECT" /********* ~3uuuuu3334Tp)    lSZZmmmmmmmmmX */
.charAt(i)})]}; Sys.println('flag is ' + /* )uu334TpgZP.         yhASZZmmmmO */
w.join("")); } }).fields; } #end } /******* oTpgZmmm>              \kkkpASZm */

Solution

Part I: Jumping into the rabbit hole

🧠 I seriously underestimated the challenge. I have suffered enough from mrii and decided to look into another one. This challenge looked pretty simple when I first look at it. Well, both challenges are written by Aurel300 and I should have aware of this.

I am still in trauma from mrii that the expected behaviour may not be reproducible when the code is changed (in particular, when they are prettified or annotated). However, making the code readable is always the priority. The correctness is not important during static analysis. So let's format it!

/**union{paste the flag here}*/

#if macro
import haxe.macro.Context as C;
import haxe.Int64 as B;
import haxe.macro.Expr;

using StringTools;
#else
@:build(Flag.a("CrimsonCarp"))
#end
class Flag {
  #if macro
  static final A = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-";
  static final K = [
    "0000000008258240093",
    "00000032059146620",
    "0000059424680571",
    "00000016501626909",
    "000000010391056299",
    "00000000039819521706",
    "000063129310438",
  ];

  static function a(F) {
    var d = C.getLocalClass().get().doc;
    ~/^union\{([a-zA-Z0-9_-]{28})\}$/.match(d) || throw "invalid flag";
    var L: B = 0x00000;

    function F(i): String {
      var u: B = 0;
      var e = [A.indexOf][313337 - 0x4C7F9];

      u += e(d.charAt((i << 2) + 6));

      u <<= 6;
      u += e(d.charAt((i << 2) + 7));
      u <<= 6;
      u += e(d.charAt((i << 2) + 8));
      u <<= 6;
      u += e(d.charAt((i << 2) + 9));
      u <<= 6;
      u += e(d.charAt(i));
      u <<= 6;
      u ^= B.parseString(K[i]) ^ C.getPosInfos((macro aa / v77777777yTTTTTkkkCggggggggPD4!).pos).min;
      u >= L || throw "bad flag";

      L = u;

      var R = [
        for (i in 0...7) {
          var A = A.charAt((u % 26).low);
          u /= 26;
          A;
        }
      ];
      R.reverse();
      return R.join("");
    }
    return (macro class {
      static function main() {
        var w: Array<haxe.DynamicAccess<Dynamic>> = ([
          {"cdjholca": _ -> "s"},
          {"rarfzray": _ -> "o"},
          {"0atpgyte": _ -> "w"},
          {"0gzqjili": _ -> "r"},
          {"0tpeqmwy": _ -> "o"},
          {"wledebwo": _ -> "n"},
          {"nijgbmst": _ -> "g"},
        ] : Array<Dynamic>);
        $b{
          [
            for (i in 0...0x7)
              macro w[$v{i}].set($v{F(i)}, _ -> $v{
                "CORRECT".charAt(i)
              })
          ]
        };
        Sys.println('flag is ' + w.join(""));
      }
    }).fields;
  }
  #end
}

The entire logic looked simple to someone who doesn't know Haxe at all. For example, the flag has the format /^union\{[a-zA-Z0-9_-]{28}\}$/. Also, the function F(i) is being called for seven times, i.e., F(0), F(1), ..., F(6) and seven numbers $u_0, u_1, ..., u_6$ are derived from the flag. In particular, the following line has hinted that $u_0 \leq u_1 \leq ... \leq u_6$.

u >= L || throw "bad flag";

Each of the $u_i$'s will then be used to compute a 7-character string as the function result. What's that for? Let's skip it for now.

Part II: Weird stuff are happening...

When I understand the code more, I found that there is something weird. Let's look at the following Haxe snippet.

class Test {
  static function main() {
    var w: Array<haxe.DynamicAccess<Dynamic>> = ([
      {"cdjholca": _ -> "s"},
      {"rarfzray": _ -> "o"},
      {"0atpgyte": _ -> "w"},
      {"0gzqjili": _ -> "r"},
      {"0tpeqmwy": _ -> "o"},
      {"wledebwo": _ -> "n"},
      {"nijgbmst": _ -> "g"},
    ] : Array<Dynamic>);
    
    Sys.println('flag is ' + w.join(""));
    // Prints "flag is sowrong"
  }
}

From what I understand, the type Array<haxe.DynamicAccess<Dynamic>> is equivalent to an array of maps. This is a Python-equivalent of the above snippet:

w = [
  {"cdjholca": lambda _: "s"},
  {"rarfzray": lambda _: "o"},
  {"0atpgyte": lambda _: "w"},
  {"0gzqjili": lambda _: "r"},
  {"0tpeqmwy": lambda _: "o"},
  {"wledebwo": lambda _: "n"},
  {"nijgbmst": lambda _: "g"}
]
print(f'flag is {w}')
# Prints "flag is [{'cdjholca': <function <lambda> at 0x7fe0b5cb4af0>}, ...]"

Why are w[i]'s one-character long in their representations? I wrote some snippets to test with the behaviour. I was expecting when there are multiple entries inside DynamicAccess, then the character representing the entry will the output of the lexicographically smallest key.

class Test {
  static function main() {
    var w: Array<haxe.DynamicAccess<Dynamic>> = ([
      {"cdjholca": _ -> "s"},
      {"rarfzray": _ -> "o"},
      {"0atpgyte": _ -> "w"},
      {"0gzqjili": _ -> "r"},
      {"0tpeqmwy": _ -> "o"},
      {"wledebwo": _ -> "n"},
      {"nijgbmst": _ -> "g"},
    ] : Array<Dynamic>);
    
    w[0].set("cdjholca", _ -> "S");
    w[1].set("rarfzrax", _ -> "O");
    w[2].set("0atpgytf", _ -> "W");
    w[3].set("0tpeqmwy", _ -> "R");
    w[4].set("0gzqjili", _ -> "O");
    w[5].set("0gzqjili", _ -> "N");

    Sys.println('flag is ' + w.join(""));
    // Prints "flag is SowRONg"
  }
}

From what I guess, the snippet should be printing flag is SOwrONg... My guess is sOwRoNg. It now seemed to me that the representation will be replaced only if the new key is one of the strings: cdjholca, rarfzray, ... and nijgbmst. They are the initial keys defined in each of the w[i]'s. The following snippets demystify a bit while bringing another puzzle to me. Let's guess what will the below snippet prints:

class Test {
  static function main() {
    var w: Array<haxe.DynamicAccess<Dynamic>> = ([
      {"cdjholcb": _ -> "s"},
      {}
    ] : Array<Dynamic>);
    
    w[1].set("rarfzray", _ -> "O");

    Sys.println('flag is ' + w.join(""));
  }
}

Well, I bet you cannot predict the results. It is flag is {cdjholcb: #fun}O... Why? What I knew is the function got executed when the key matches one of the cdjholca, etc. Otherwise, it matches my expectation of how haxe.DynamicAccess<Dynamic> should be represented. In our case, printing w[0] alone is {cdjholcb: #fun}. So there are two questions left:

  1. Why are the functions got executed when the key is cdjholca?
  2. Are there any other keys like cdjholca that will have the functions executed?

Part III: Why is it got executed?

I have further simplified the code. Let's have a look on the below snippet:

class Test {
  static function main() {
    var jsonData = '{"foo":"bar", "0gzqjili": "444"}';
    var json: haxe.DynamicAccess<String> = haxe.Json.parse(jsonData);
    trace(json);
  }
}
// Uncaught exception Cannot call 444
// Test.hx:5: characters 5-10 : Called from here

To understand why is the function got executed, we may need to look at how strings are printed. These are some code snippets I found during the process.

// https://github.com/HaxeFoundation/haxe/blob/4.2.0/std/Sys.hx#L39
	/**
		Prints any value to the standard output, followed by a newline.
		On Windows, this function outputs a CRLF newline.
		LF newlines are printed on all other platforms.
	**/
	static function println(v:Dynamic):Void;
// https://github.com/HaxeFoundation/haxe/blob/4.2.0/std/php/_std/Sys.hx#L32
	public static inline function print(v:Dynamic):Void {
		Global.echo(Std.string(v));
	}
// https://github.com/HaxeFoundation/haxe/blob/4.2.0/std/Std.hx#L86
	/**
		Converts any value to a String.
		If `s` is of `String`, `Int`, `Float` or `Bool`, its value is returned.
		If `s` is an instance of a class and that class or one of its parent classes has
		a `toString` method, that method is called. If no such method is present, the result
		is unspecified.
		If `s` is an enum constructor without argument, the constructor's name is returned. If
		arguments exists, the constructor's name followed by the String representations of
		the arguments is returned.
		If `s` is a structure, the field names along with their values are returned. The field order
		and the operator separating field names and values are unspecified.
		If s is null, "null" is returned.
	**/
	static function string(s:Dynamic):String;

From above, we can see that Sys.println calls Std.string (at least it happens for PHP, Lua and some other languages). Also, Std.string(s) calls s.toString() if it is defined.

class Test {
  static function main() {
    var foo: haxe.DynamicAccess<Dynamic> = {};
    foo.set("test", () -> "test");
    Sys.println(foo); // {test: #fun}    
    Sys.println(Std.string(foo).length); // 12
    
    var foo: haxe.DynamicAccess<Dynamic> = {};
    foo.set("cdjholca", () -> "cdjholca");
    Sys.println(foo); // cdjholca
    Sys.println(Std.string(foo).length); // 8
  }
}

From checking the length of its string representation, we can say that Std.string has already triggered () -> "cdjholca". Looking further, I found the method that serializes objects - haxe.Serializer.run.

Why serialization? Although it may seemed out of context, serializing the object was giving me insights on why cdjholca is called.
class Test {
  static function main() {
    var foo: Dynamic = {"test": "foo"};
    Sys.println(haxe.Serializer.run(foo));
    // oy4:testy3:foog
        
    var foo: Dynamic = {"cdjholca": "cdjholca"};
    Sys.println(haxe.Serializer.run(foo));
    // oy8:toStringy8:cdjholcag
  }
}

It makes sense when serializing {"test": "foo"}. It can be broken into a few parts:

 o       y       4       :          test  y       3       :          foo   g
[object][string][length][separator][data][string][length][separator][data][suffix]
        [key                            ][value                          ]

Why is cdjholca changed to toString? I don't know, but this explains why the w[i]'s are not printing its entire map but a value from the function call. In short, the method toString in the Dynamic is overwritten. When Sys.println is called, the replaced toString is called, returning the new value. Anyway, this is the dead end I reached during the CTF...

Part IV: Jumping into another rabbit hole

After the CTF ends, there is a discussion in the Discord server and this is what the challenge author said:

Uh oh. I have actually read one of the files during the CTF and though it wasn't relevant. Since haxe --run is running on an interpreter called Eval, we may need to read how are strings printed. While interpreting with Eval, objects in Haxe are implemented with HashTbl in OCaml.

(* https://github.com/HaxeFoundation/haxe/blob/4.2.0/src/macro/eval/evalPrinting.ml#L52 *)
  let rec s_object depth o =
    let fields = object_fields o in
    let buf = Buffer.create 0 in
    Buffer.add_string buf "{";
    List.iteri (fun i (k,v) ->
      if i > 0 then Buffer.add_string buf ", ";
      Buffer.add_string buf (rev_hash k);
      Buffer.add_string buf ": ";
      Buffer.add_string buf (s_value depth v).sstring;
    ) fields;
    Buffer.add_string buf "}";
    let s = Buffer.contents buf in
    create_with_length s (try UTF8.length s with _ -> String.length s)
(* https://github.com/HaxeFoundation/haxe/blob/4.2.0/src/macro/eval/evalPrinting.ml#L105 *)
  and s_value depth v =
    let call_to_string () =
      let vf = field_raise v EvalHash.key_toString in
      s_value (depth + 1) (call_value_on v vf [])
    in
    if depth > 5 then rstop
    else match v with
    | VNull -> rnull
(* https://github.com/HaxeFoundation/haxe/blob/development/src/macro/eval/evalHash.ml *)
  let reverse_map = Hashtbl.create 0

  let rev_hash i = Hashtbl.find reverse_map i

  let hash f =
    let i = Hashtbl.hash f in
    Hashtbl.replace reverse_map i f;
    i

  let path_hash path = hash (Globals.s_type_path path)

  let key_length = hash "length"
  let key_toString = hash "toString"
  let key_OutsideBounds = hash "OutsideBounds"
  let key_low = hash "low"
    (* omitted *)
Hash tables. Copied from Wikipedia.

OCaml HashTbl is implementing the hash table and they are keyed by 32-bit integers. This is vulnerable to preimage attacks owing to the small keyspace. In particular, when Haxe is printing a string, call_to_string is called.

(* https://github.com/lucasaiu/ocaml/blob/4.00/stdlib/hashtbl.mli#L303 *)
  val hash : 'a -> int
  (** [Hashtbl.hash x] associates a nonnegative integer to any value of
    any type. It is guaranteed that
    if [x = y] or [Pervasives.compare x y = 0], then [hash x = hash y].
    Moreover, [hash] always terminates, even on cyclic structures. *)
(* https://github.com/lucasaiu/ocaml/blob/4.00/stdlib/hashtbl.ml#L18 *)
external seeded_hash_param : int -> int -> int -> 'a -> int = "caml_hash" "noalloc"
external old_hash_param : int -> int -> 'a -> int = "caml_hash_univ_param" "noalloc"

let hash x = seeded_hash_param 10 100 0 x
let hash_param n1 n2 x = seeded_hash_param n1 n2 0 x
let seeded_hash seed x = seeded_hash_param 10 100 seed x

Finally, a implementation of seeded_hash_param can be referred here. Anyway, since we are hashing strings, what we actually care is the following code snippets:

// https://github.com/lucasaiu/ocaml/blob/4.00/byterun/hash.c#L151
  CAMLexport uint32 caml_hash_mix_string(uint32 h, value s)
  {
    mlsize_t len = caml_string_length(s);
    mlsize_t i;
    uint32 w;

    /* Mix by 32-bit blocks (little-endian) */
    for (i = 0; i + 4 <= len; i += 4) {
  #ifdef ARCH_BIG_ENDIAN
      w = Byte_u(s, i)
          | (Byte_u(s, i+1) << 8)
          | (Byte_u(s, i+2) << 16)
          | (Byte_u(s, i+3) << 24);
  #else
      w = *((uint32 *) &Byte_u(s, i));
  #endif
      MIX(h, w);
    }
    /* Finish with up to 3 bytes */
    w = 0;
    switch (len & 3) {
    case 3: w  = Byte_u(s, i+2) << 16;   /* fallthrough */
    case 2: w |= Byte_u(s, i+1) << 8;    /* fallthrough */
    case 1: w |= Byte_u(s, i);
            MIX(h, w);
    default: /*skip*/;     /* len & 3 == 0, no extra bytes, do nothing */
    }
    /* Finally, mix in the length.  Ignore the upper 32 bits, generally 0. */
    h ^= (uint32) len;
    return h;
  }
// https://github.com/lucasaiu/ocaml/blob/4.00/byterun/hash.c#L188
  CAMLprim value caml_hash(value count, value limit, value seed, value obj)
  {
    // omitted
    h = Int_val(seed);
      switch (Tag_val(v)) {
      case String_tag:
        h = caml_hash_mix_string(h, v);
        num--;
        break;
      // omitted
      }
    FINAL_MIX(h);
    /* Fold result to the range [0, 2^30-1] so that it is a nonnegative
      OCaml integer both on 32 and 64-bit platforms. */
    return Val_int(h & 0x3FFFFFFFU);
  }

When we are hashing strings with caml_hash, the arguments count and limit aren't used. In this implementation, toString, cdjholca, rarfzray, ... all hashes into 781809894. I have implemented a Python-equivalent script and a complete search on ^[a-z]{7}$ took me around 45 minutes with pypy3. I think the list of preimages can be retrieved via a meet-in-the-middle approach and can be solved immediately, but I am not time-constrained after the game. These are the candidates who have the same digest with hash "toString":

bcwtdes dqoagnv hqxmvkq jfntbvq nanqyku pqeytpx pruxdbh sipootq

Part V: The final touches

Now we are finally back to the haxe script! There are ... possible $0 \leq u < 2^{36}$ for F(i) to return one of the above 7-byte strings. Let's recall how u is derived inside F(i):

var u:B = 0;
var e = [A.indexOf][313337 - 0x4C7F9];
u += e(d.charAt((i << 2) + 6));

u <<= 6;
u += e(d.charAt((i << 2) + 7));
u <<= 6;
u += e(d.charAt((i << 2) + 8));
u <<= 6;
u += e(d.charAt((i << 2) + 9));
u <<= 6;
u += e(d.charAt(i));
u <<= 6;
u ^= B.parseString(K[i]) ^ C.getPosInfos((macro aa / v77777777yTTTTTkkkCggggggggPD4!).pos).min;

Hereby d is the flag, K is a given array of constants. Also, e(d.charAt(j)) is a simple mapping from the $j$-th character of the flag. The one that remained mystery is C.getPostInfos((macro aa / v77...!).pos).min. Luckily, it is not hard know what the corresponding value is. I can simply print and found that it equals 1639. Well, remember the trauma I had? I tried messing around with the code and found the value changes when the code is updated...

Well, let's cut the crap. The correct value would be 1763, and it is equal to the number of characters before the macro. Now everything is clear. We are going to find $u_0, u_1, ..., u_6$ those are increasing. I have written a solve script that recovers the flag: union{h4shC0ll1zion_m4ke5_HAXE_sad}. Flag is CORRECT! 🎉

Warning. The solve script is highly unannotated and very likely unreadable.

So that's it! I know it must be tough going through the entire blog post. Thanks for bearing with me and hope you enjoyed it.