The Mad Max DGA

This post describes a domain generation algorithm (DGA) used by the “Mad Max” malware family. Mad Max is a targeted trojan, and we plan to post a follow-up article that documents our findings regarding the features of the Mad Max malware itself. But for now we will focus on the reversing of its DGA, since we were unable to find any other published research on this topic.

The reference sample we focus on has MD5 hash c7d1357f4c4acceb1780db12ad1b4de1. It first came to our attention because it triggered an ETPro signature alert for “APT.MADMAX” while passing through our sandboxing automation. We could find very little published research on this threat, other than one analysis report from Sophos [1]. This was perhaps due to the preponderance of web search hits related to the famous Mel Gibson movie of the same name:


The sample has pretty generic detections on Virus Total. We intend to post further details on the malware’s features, installation life cycle, etc. in a follow-up article, but for now suffice it to stay that the original sample drops several DLLs onto the infectee, which are then executed via rundll32.exe. During the reversing of Mad Max’s DGA, the dropped DLL that we spent the most time with weighed 1,561,600 bytes and had MD5 hash of 43538f5fb75003cbea84c9216e12c94a. It was dropped into C:\Users\Admin\AppData\Local\Temp as c_375EF.tmp.

Code De-Obfuscation

One of the obstacles presented by this malware is that its code is heavily obfuscated; small sequences of one or more “real” malcode instructions are buried amidst a much larger amount of dummy instructions. Figure 1 shows a representative example – only the five instructions colored yellow are substantive (“real”); the surrounding instructions exist purely for obfuscation purposes.

This form of obfuscation is fairly effective; it makes life difficult for both IDA Pro as well as the human reverse engineer. IDA gets itself quite befuddled and for the most part is unable to parse the instructions into distinct functions – and without the functional structure, much of IDA’s other automated annotations, such as detection and labeling of function arguments, local stack variables, parsing into code blocks, “graph mode”, etc. fails as well. I can also report that the obfuscation is effective against the human reverser: it becomes much more painstaking to gain an understanding of the code when every few real instructions are separated by dozens of junk instructions. Sadly, the use of this sort of obfuscation seems to be a growing trend and is being encountered more frequently these days.

Figure 1. Mad Max's Code Obfuscation

Figure 1. Mad Max’s Code Obfuscation

Fortunately, there is some rhyme and reason to Mad Max’s code obfuscator; it generates a very reliable pattern, which allows one to write a de-obfuscator in order to separate the wheat from the chaff. Following each short sequence of “real” instructions are a triplet of PUSHF, TEST, and conditional jump (either JNZ or JZ) instructions. The TEST instruction will always set the Z flag such that the subsequent conditional jump is taken. This jump will skip forward over a random number of dummy instructions to a POPF instruction. Following the POPF will be the next short sequence of “real” instructions. The basic repeating pattern is illustrated as follows, where “real” instructions are in red, random dummy instructions are in purple, and the “glue” instructions for hopping EIP over the dummy opcodes, from real cluster to real cluster, are in green:

0x10001080 Random Dummy Instruction
0x10001082 Random Dummy Instruction
0x10001084 POPF
0x10001085 Real Instruction
0x10001089 Real Instruction
0x1000108B PUSHF
0x1000108C TEST some_global_constant, some_immediate_value
0x10001093 JZ / JNZ 0x1000112E
0x10001099 Random Dummy Instruction
0x1000109B Random Dummy Instruction

0x10001128 Random Dummy Instruction
0x1000112A Random Dummy Instruction
0x1000112E POPF
0x1000112F Real Instruction
0x10001132 Real Instruction
0x10001134 Real Instruction
0x10001139 PUSHF
0x1000113A TEST some_global_constant, some_immediate_value
0x10001141 JZ / JNZ 0x1000112E
0x10001147 Random Dummy Instruction
0x10001149 Random Dummy Instruction

The “glue” instructions are (presumably) auto-generated such that execution never reaches any of the dummy instructions – which are probably just randomly generated bytes that the IDA disassembler interprets as opcodes.

Since Mad Max’s code obfuscator follows the above pattern religiously, it is straightforward to write a poor man’s finite state machine utility in Python that runs through the entire obfuscated malcode and classifies each instruction as real, dummy, or glue. Once the classification is completed, our utility performs an editing process in which it makes the following changes:

  • Real instructions are left unmodified;
  • Random dummy instructions are changed to NOP (0x90) opcodes;
  • The PUSHF glue instructions are changed to unconditional JMP instructions to the next real instruction (the instruction immediately following the next POPF glue instruction, which can be unambiguously calculated by inspecting the operand of the following JZ/JNZ glue instruction);
  • The POPF, TEXT, and JZ/JNZ glue instructions are also changed to NOPs;

Running this simple algorithm over the Mad Max malcode yields a de-obfuscated binary which can then be reloaded into IDA Pro; IDA is much happier this time:

Figure 2. IDA screenshot of general code region from Figure 1 after de-obfuscation.

Figure 2. IDA screenshot of general code region from Figure 1 after de-obfuscation.

Now that’s more like it! IDA can now recognize functions, stack variables, code blocks, cross references, etc. The garbage that we NOPed out hardly shows up at all – especially in graph view. In addition, in the original obfuscated code, the random dummy bytes generated in between the glue instructions often yield some very “unusual” instructions when interpreted as opcodes – and these opcodes are so arcane that in many cases IDA even misinterprets large blocks as being data instead of code. After the de-obfuscation, this annoyance goes away as well.

Overall DGA Algorithm

OK, now that we have some reasonable code with which to work, it is time to actually reverse the DGA. We initially suspected that Mad Max might be using a DGA because multiple sandbox runs spaced several days apart yielded phone home attempts to different hostnames. The vast majority of DGAs take some type of timestamp as their variable “seed” input to the algorithm. Turns out Mad Max is no exception, and we find a call to GetSystemTime() leads us straight to the domain generation function, which we shall refer to as DoDga_sub_10045851().

The DoDga_sub_10045851 function performs the following high-level steps:

  1. Generates an ASCII seed string using the current year, month, and “week of the month” as its variable input, as well as a hard-coded constant; this string will be either 42 or 43 bytes in length, depending on the current month;
  2. Computes the MD5 hash of this seed string, yielding 128 bits that will be interpreted as a 16-element “indexing table”;
  3. Performs a post-processing operation on the first bytes of this “indexing table”; this simply consists of byte-wise ADDing the last 8 bytes of the table to the first 8 bytes;
  4. Uses the first 10 bytes of the final 16-byte “indexing table” as indices into a 62-byte “domain character lookup table” to yield a 10-character secondary-level domain (SLD);
  5. Appends a top-level domain (TLD) chosen based on the current “week of the month”;
  6. Prepends “www.” to yield a full hostname;

In the following sub-sections, we will document these steps in more detail.

Step 1. Generating the Seed String

Mad Max’s DoDga function starts by constructing a seed string in a local stack frame buffer var_270. Although the seed string will only be 42 or 43 characters in length, depending on the current month, this local buffer is sized to hold 64 bytes of data. This is because the MD5 hashing operation in the following step operates on 64-byte chunks of input data, and so the local buffer contains enough room for the appropriate padding.

The ASCII seed string is fabricated using a hard-coded UUID parameter and components of the current date. Here is an example for July 14, 2016, with the UUID component in gray, the current year in blue, the current month in red, and the current “week of the month” in green:


The UUID component of this string is obtained by passing a 16-byte binary buffer to the Win32 API UuidToStringA(), which interprets these sixteen bytes as a UUID struct and returns the string representation of that UUID in Microsoft’s format. The binary UUID source bytes are passed to DoDga on the stack from the calling function, occupying the positions typically used for the first four arguments (i.e., arg_0, arg_4, arg_8, and arg_C, using IDA’s nomenclature.) The UUID is copied onto the stack from the global data buffer DGA_UUID_unk_10172D44, as shown in Figure 3. The fifth argument to DoDga(), arg_10, is a pointer to a local buffer in the caller’s stack frame into which DoDga() will write the generated domain output.

Sadly, but not surprisingly, Mad Max does not store this hard-coded UUID embedded in plaintext in the .data section of its binary for our convenience; in fact DGA_UUID_unk_10172D44 is uninitialized in the static .DLL file dropped by Mad Max. Fortunately this is not a major impediment because a sandbox memory dump of the running rundll32 process hosting the Mad Max DLL will easily capture the plaintext value of the hard-coded UUID, as shown in Figure 4. After invoking UuidToStringA() on this UUID, DoDga() will strcpy() the resulting ASCII string represention to the beginning of its 64-byte seed string buffer var_270.

Figure 3. Passing 16-byte binary UUID to DoDga()

Figure 3. Passing 16-byte binary UUID to DoDga()

Figure 4. Hard-coded UUID struct in plaintext captured from sandbox memdump

Figure 4. Hard-coded UUID struct in plaintext captured from sandbox memdump

Next, Mad Max will invokes the GetSystemTime() API to obtain a SYSTEMTIME struct containing the current UTC time stamp. It will then extract the wYear and wMonth fields from this struct and convert both to ASCII string representations via the standard ultoa() C function. This will yield the strings “2016” and “7”, for example, in the case of a July 14, 2016 time stamp. These two components are appended to the seed string buffer via strcat().

Finally, Mad Max will consult the wDay field and compute the “week of the month”. The algorithm is trivial:

  def _get_week_of_month(self, wDay):
    if wDay <= 7:
      week_of_month = 1 # (1 thru 7)
    elif wDay <= 14:
      week_of_month = 2 # (8 thru 14)
    elif wDay <= 21:
      week_of_month = 3 # (15 thru 21)
      week_of_month = 4 # (22 thru 31)
    return week_of_month

The current week of the month is also converted from integer to string and appended to the seed string via strcat(). At this point, the 64-byte seed string buffer will hold a 42 or 43 character (depending on the month) ASCII string such as the following:


This represents the base material used for generating variability in the resulting domains. Since this base material will change over time at the granularity of a single week, Mad Max will end up switching to a new domain on a weekly basis.

Step 2. Generating a 16-byte Indexing Table

The purpose of the next step is to perform operations on the seed string to produce a 16-byte “indexing table” which will be used to look up individual characters to produce a weekly domain. The sequences of instructions to perform this operation are long and algorithmically complicated; Figure 5 gives a rough idea of what we’re dealing with (note the formidable graph overview) – very long series of blocks that perform various mathematical and bitwise operations on 32-bit portions of the indexing table and seed string.

The disassembly contains a number of clues as to what is going on here. For one thing, the 16-byte indexing table is initialized with some distinctive hard-coded values, as shown in Figure 6.

In memory, the indexing table will be initialized to look like this:

           00 01 02 03 04 05 06 07   08 09 0a 0b 0c 0d 0e 0f   ----- ASCII ----- 
0x7effed88  01 23 45 67 89 ab cd ef   fe dc ba 98 76 54 32 10   .#Eg.... ....vT2.

These hard-coded constants are associated with various well-known hashing algorithms, including MD5, MD4, SHA-1, etc.

Figure 5. Sample instructions for generating indexing table from seed string

Figure 5. Sample instructions for generating indexing table from seed string

Figure 6. Initializing the Indexing Table

Figure 6. Initializing the Indexing Table

Secondly, prior to performing the types of complicated operations shown in Figure 5, Mad Max will append a single 1 bit (0x80 byte) to the seed string, followed by enough additional 0 bits (0x00 bytes) to pad the seed string buffer to 56 bytes. It will then append 8 more bytes containing the 64-bit integer representation of the original length (in bits) of the seed string (i.e., either 0x150 or 0x158, depending on whether the original seed string was 42 or 43 characters, respectively.) This also is consistent with the padding mechanism used by MD5.

The third clue is that some of the hard-coded 32-bit constants found in the actual bit manipulation instruction sequences show up in various known implementations of MD5. For example, both 0x28955B88 and 0x173848AA , shown in Figure 7, are pre-computable values associated with MD5.

Figure 7. Two of many tell-tale constants: 0xA83F051, 0x4787C62A, etc.

Figure 7. Two of many tell-tale constants: 0xA83F051, 0x4787C62A, etc.

Emulation of the 800+ instructions associated with the generation of the indexing table confirm that it is indeed an implementation of am straight unmodified MD5 hash. For our example date of July 14, 2016, the MD5 hash of the seed string produces the following indexing table:

            00 01 02 03 04 05 06 07   08 09 0a 0b 0c 0d 0e 0f   ----- ASCII ----- 
0x7effed88  f1 a6 80 dd 24 ed 31 a5   ab 2b a3 da 61 c7 d1 af   ....$.1. .+..a... 

Step 3. Post-processing the Indexing Table

Once the MD5 hash of the seed string has been calculated, Mad Max performs a final post-processing step on the resulting 16-byte indexing table. Specifically, it updates the first eight bytes by performing byte-wise addition of the second eight bytes (see Figure 8).

Figure 8. Byte-wise ADD post-processing operation on indexing table

Figure 8. Byte-wise ADD post-processing operation on indexing table

Again, using July 14, 2016 as our sample date, the final indexing table is:

            00 01 02 03 04 05 06 07   08 09 0a 0b 0c 0d 0e 0f   ----- ASCII ----- 
0x7effede0  9c d1 23 b7 85 b4 02 54   ab 2b a3 da 61 c7 d1 af   ..#....T .+..a... 

Steps 4-6. Producing the weekly domain

At last, it is time to use the final post-processed 16-byte indexing table to produce a weekly domain. Each byte of the indexing table will generate a single domain character; thus, Mad Max could produce domain names up to 16 characters in length. However, it chooses to limit the domains to 10 characters.

Each of the first ten byte values in the final indexing table is used as an index into the following hard-coded 62-character DGA_LUT lookup table (see Figure 9):


Figure 9. Indexing into DGA lookup table

Figure 9. Indexing into DGA lookup table

So for our example date of July 14, 2016, we get the following domain:

Position Indexing Table Byte Index mod 62 Domain Character
0 0x9c 0x20 9
1 0xd1 0x17 q
2 0x23 0x23 n
3 0xb7 0x3b r
4 0x85 0x09 6
5 0xb4 0x38 o
6 0x02 0x02 y
7 0x54 0x16 s
8 0xab 0x2f s
9 0x2b 0x2b k

The resulting second level domain (SLD) is 9qnr6oyssk.

The last step and final step is to prepend “www.” to the SLD and then choose a top level domain (TLD). This operation is trivial and involves querying the current “week of the month” from a hard-coded lookup table (see Table 1.)

The final result:

Appendix I contains a Python re-implementation of the above Mad Max DGA. It uses the hard-coded UUID 9135d48a-6103-4894-856c-b29897742e52 found in the Mad Max reference sample c7d1357f4c4acceb1780db12ad1b4de1.

Table 1. Determinig TLD from “week of the month”
Week of Month TLD
1 .com
2 .org
3 .info
4 .net

Past and Future Mad Max Domains

Appendix II contains the complete set of past and future Mad Max domains for the years 2015, 2016, and 2017, as generated by our Python reimplementation. For the purpose of validating our DGA findings (among other reasons), we have registered and sink-holed some of these domains and have observed live Mad Max bot checkins; at the time of writing (July 15, 2016), the bots are checking in to the domain.

Over the course of the first three days of monitoring, we observed bots checking in to the sinkhole from the following sixteen countries: Brazil, Canada, China, Finland, France, Germany, India, Italy, Japan, South Korea, Norway, Taiwan, Thailand, Ukraine, United Kingdom, and United States.

In addition, we have also established additional confidence in our DGA reimplementation from historical observations of Mad Max attempting to check in to the following historical domains, all of which were back-generated by our implementation:


We could find little to no published information on this Mad Max family, although it is quite possible that it has been previously documented under another name. Based on our sinkholing results to date, it certainly appears to be an active botnet at this time. Although we wanted to initially focus on cracking the DGA in order to obtain domain indicators and deploy monitoring infrastructure, we are continuing our analysis and plan to follow up this blog post with another that provides a deeper dive into the features and internal workings of Mad Max.



Appendix I – Python re-implementation of Mad Max DGA

This code is based on the Mad Max sample c7d1357f4c4acceb1780db12ad1b4de1.

# Standard imports 
import sys 
import datetime 
import hashlib 
import struct 

_TLD_LUT = { 
    1: ".com",      # 0x6f725b48 in 21020-child.dump 
    2: ".org",      # 0x6f725b52 
    3: ".info",     # 0x6f725b5c 
    4: ".net",      # 0x6f725b66 
_DGA_PREFIX = "www." 
_DGA_LUT = "jfyicbya26h5hvepgq07zfsqmdk4xcet9annmwuw8rok3lzsxlvjpdubog1rit" 

def _gen_system_time(date): 
    return { 
        "wYear": date.year, 
        "wMonth": date.month, 
        # Python datetime uses Monday=0; Windows SYSTEMTIME uses Sunday=0.... 
        "wDayOfWeek": (date.weekday() + 1) % 7, 
        "wHour": 0, 
        "wMinute": 0, 
        "wSecond": 0, 
        "wMilliseconds": 0, 

def _get_week_of_month(wDay): 
    if wDay <= 7: 
        # (1 - 7) 
        week_of_month = 1 
    elif wDay <= 14: 
        # (8 - 14) 
        week_of_month = 2 
    elif wDay <= 21: 
        week_of_month = 3 
        week_of_month = 4 
    return week_of_month 

def _get_week(date): 
    system_time = _gen_system_time(date) 
    return (system_time["wYear"], system_time["wMonth"],

def _compute_dga_domain(week): 

    year, month, week_of_month = week 
    uuid_string = "9135d48a-6103-4894-856c-b29897742e52" 
    seed_string = "%s%d%d%d" % ((uuid_string,) + week) 

    md5 = hashlib.md5(seed_string).digest() 
    first_eight = struct.unpack("BBBBBBBB", md5[:8]) 
    last_eight = struct.unpack("BBBBBBBB", md5[8:]) 
    mod_first_eight = [0xff & (first + last) for first, last \
                       in zip(first_eight, last_eight)] 
    index_table = struct.pack("BBBBBBBB", *mod_first_eight) + md5[8:] 

    base_domain = "".join([_DGA_LUT[ord(_) % _LEN_DGA_LUT] \
                           for _ in index_table[:_LEN_DOMAINS]]) 
    return _DGA_PREFIX + base_domain + _TLD_LUT[week[2]] 

def compute_dga_domains(start_date, stop_date): 
    one_day = datetime.timedelta(days=1) 
    current_date = start_date 
    domains = [] 
    last_week = None 
    while current_date <= stop_date: 
        week = _get_week(current_date) 
        if week != last_week: 
            domain = _compute_dga_domain(week) 
            domains += [(current_date, week)] 
            last_week = week 
            print "DGA %s: %s" % \
                  (current_date.strftime(_DATESTAMP_FORMAT), domain) 
        current_date += one_day 

def main(): 
    num_args = len(sys.argv) 
    num_args = len(sys.argv) 
    if num_args not in (2, 3): 
        print "usage: %s YYYY-MM-DD [YYYY-MM-DD]" % sys.argv[0] 
    start_date = datetime.datetime.strptime(sys.argv[1], _DATESTAMP_FORMAT) 
    stop_date = datetime.datetime.strptime(sys.argv[2], _DATESTAMP_FORMAT) \
                if num_args == 3 else start_date 
    compute_dga_domains(start_date, stop_date) 

if __name__ == "__main__": 

Appendix II – Mad Max Domains for 2015 through 2017

The following table contains the weekly Mad Max hostnames generated by our Python re-implementation. The Date column indicates the date upon which the corresponding hostname first becomes “active”. Note that Mad Max uses UTC time for determining the current date.

Date              Mad Max hostname

2 Responses to “The Mad Max DGA”

July 28, 2016 at 10:20 pm, darklord said:

is the analysis based on the malware or the DLL it drops.

August 04, 2016 at 7:11 pm, Brian said:

A small mistake which isn’t a technical failing, most likely a result of editing in one place but the other. In the text you refer to fixed values in Figure 7. “For example, both 0x28955B88 and 0x173848AA , shown in Figure 7,” But the values shown in Figure 7 (and mentioned in the caption for Figure 7) are different. “Figure 7. Two of many tell-tale constants: 0xA83F051, 0x4787C62A, etc.”

Comments are closed.