Extracting the config from Blackmatter ESXi ELF Encryptors

Sample: f864922f947a6bb7d894245b53795b54b9378c0f7633c521240488e86f60c2c5

MalwareBazaar Database (!this is a database of live malware, visit with caution)

I did most of what you see in Jupyter Labs, a web based notebook that lets you run code inline, and surround it by markdown notes, and contributed my work to an awesome community project by the guys at OALabs. Pull down my code yourself from https://github.com/OALabs/Lab-Notes/blob/main/Blackmatter/ESXi/

In recent months, the Blackmatter ransomware as a service group (recently disbanded) has been targeting and encrypting VMWare ESXi servers using a custom ELF encryption payload, designed to encrypt virtual hard drives and memory, causing huge damage in a short space of time, before clearing up after itself, by deleting logs, and itself, and leaving behind a friendly ransom note.

In this article, I'll be talking through the steps I took to identify, extract, and decode a configuration file from one of these payloads.

Getting Started

Now the term 'getting started' may be a bit misleading here because the truth is that before I even thought about a configuration file, I started where I tend to start for these sorts of things, and that's by loading my payload into IDA and spending a good hour marking up my sample.

I slowly but surely marked up my IDA pseudocode, renaming functions and variables, and commenting to add clarity. Truthfully in the middle of an incident I would not bother with any of this, but outside of work it's an absolutely crucial part of my learning process
I slowly but surely marked up my IDA pseudocode, renaming functions and variables, and commenting to add clarity. Truthfully in the middle of an incident I would not bother with any of this, but outside of work it's an absolutely crucial part of my learning process

I won't bore you with the details of how I marked up my code, and how I worked out what all this stuff does (maybe another day...) and in this case, I don't actually have to.

If in doubt, cheat

Whilst malware authors tend to make it as difficult as possible to dissect their creations, the BlackMatter authors in this case made a small mistake, by leaving a selection of debug strings in their code (based on some of the parameters location in memory, it's possible this mistake was introduced by their use of the library cryptopp, which is statically compiled into their payload)

I named this function after exactly what is seems to do... returns the time of day
I named this function after exactly what is seems to do... returns the time of day

This meant that whatever function I was looking at, if I was lucky, I'd find a string like the one pictured above listing its name, being passed into our strange time-telling function shown above.

This made it very easy to eventually find the function responsible for loading the config, aptly named app::setup_impl::init_cfg()

What does it do?

image

The first thing this function does is loads a string from a set place in memory. We can simply double click to view the location being referenced here. Following this through takes us to the following:

Note: when I refer to 'memory' what I actually mean is a specific offset in our binary. However, dissasemblers like IDA and Ghidra will show you the memory offsets where things would exist once the binary is executed and loaded into memory.
Note: when I refer to 'memory' what I actually mean is a specific offset in our binary. However, dissasemblers like IDA and Ghidra will show you the memory offsets where things would exist once the binary is executed and loaded into memory.

Looks like we're getting somewhere. We've been taken to the beginning of a section named '_cfgETD' and we can see below that there is clearly some data stored here, and we can probably put money on this being our config data.

Now that we know where our config data resides, we can step out of IDA and begin work on extracting and decoding the configuration for this ransomware payload.

Examining our ELF file

eliben/pyelftools: Parsing ELF and DWARF in Python (github.com)

We can utilise the amazing library 'pyelftools', from Eli Bendersk, to examine our ELF file. First we can create a function to list out our file's section names

find more useful elftools examples at https://github.com/eliben/pyelftools/tree/master/examples

Sections in ELF file: blackmatter_elf .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .app.version .cfgETD .eh_frame_hdr .eh_frame .gcc_except_table .tbss .init_array .fini_array .jcr .data.rel.ro .dynamic .got .data nocommon .bss .comment .shstrtab

We can see our section from before named '.cfgETD'. This is where the config data resides in current BlackMatter ELF payloads (something I wish that I knew when I started but it's all part of the process)

We can make a small function to grab the data from that config section by name

def get_config_section_data(filename):
    with open(filename, 'rb') as f:
        elffile = ELFFile(f)

        for section in elffile.iter_sections():
            if section.name.startswith('.cfgETD'):
                return section.data()

config_data = get_config_section_data("blackmatter_elf")

To print out the contents of this section, we will need to decode the bytearray into utf-8 text. This can be done using python's built in decode() function

config_string = config_data.decode("utf-8")
print(config_string) 
Read more about character encoding at https://www.w3.org/International/questions/qa-what-is-encoding

Decoding base64

We can see from the output that our config file consists of base64 encoded text.

For those not familiar with this kind of encoding, Google is your friend, but for the purposes of this work you just need to know that we can recognise it by the seemingly random mixture of letters and numbers as well as the '==' characters on the end of the string.

Luckily for us python also includes a built in library for decoding base64 text so we can use that to decode our string.

import base64

config_b64_decoded = base64.b64decode(config_string) #now we decode our base64 string into a bytearray
print(config_b64_decoded)
base64 decodes our string into a byte array, which I won't bore you with including on this page as it's rather large
base64 decodes our string into a byte array, which I won't bore you with including on this page as it's rather large

However, if we try and repeat our trick from before and print out decode these bytes into text, we get the following garbled mess:

u'x\u068dV_j\x13\x18\x04H\x08$\x04H\x00"(\x16AQ\x08"eqC_~z?\u031c93tUl=tF\u06d3]#v~\x1dy\x18\rga:E]\x0fAAyoJ\'DL&\x19jE9\x06@Jy\x15No{Y\u015d\u05d1l}\x11#{P\u06c1\u059csSee&8\u05ad|\x0b\x0c\u03ee\x0e+\u0759a\u06dbr}yj;ak:3m~&5\x7f\x0c\x17^8T?d\u06bc?\x03~\x0f!)8C\x05F.\x03\rO\ub362!!=\x19\x03M\ub7ac)Y~5I)\x19\x1dW\x7fq\x10g1A9\x036\x17O\x1b\u05de%yy%|\\w.S\x19\x07%Y07$Z\x16Iy\x05i;giHh.\t\u0450+\x18xri\rY\x14V_}7eD\x0bsGf\x13\x16"S\x05MdIM\x7f6=\x14\x04\x1d8\x16\x11n\x05\x7fg\x12jA{\x1c6"l9o2Z9o9Qw\x19O6DOOgh\x0f?l\x01R{nEQ \r;Hl\x16-\x7fo\u0242\u07a0xxeL2q\x9c0\x1d\x0f*\x04\u77e8}M\xbbuQ\x01\u062eEl^z\x08i\x060\x04OdMe:dk\x12>UA\x145S^[1\x1bamzh\x1aW1\u333cPp\x1b1\x1d.6,5f_dTdc\x10<u\x10\u0532H;PX\x08iy\tabD\u0169\u0198vkrM)u\x05S\u058a0B1t>V|I\x15L\x008&y\x0fy/71\x01\x0e!>\x1b\x12T\x1c>\x1c;]\x13\u0668bB\u06f6qW\x1bUy7[RS\x03N\x01\u0395 tJ\u036f9>@0MW\u057fVZE;JyP\x18\x1e\u03d7~\ttE\t3,=\x18N\u3df9]y\\u^@UG,\u4a33\u93b2_\x0cZ\\\x05I\x1e.o\x13Y\x08;.\u0506\x7f<br\x0c/Vw{3\x19.\x10\x11D\n{;\x05@\x1b\u05a1\x06w%\r9<\x12ItQ@2X\x14W\x05qz\\x1ca^\\'T\\x08~;i\\x01<3U\\x10\\x02WB\\x7f!qno0\\x0fFWjTO{\\x07t{\\x04W{Z\\x0cO?PvpGBy\\r6\\r~n\\x02:U\\x19>-\\x11_\\U00060a08!-\\t5h\\x0cS\\x01|1\\x0f\\re?To;\\x0ee\\u04403E3\x07\x05~J\x16\x7f=\u0327#\xdf\x11q\x13\x08\\w6\x1awz]\x7f/|\x0f\x1a\x15~/a\x1e\x7f\u06a7~\u069c\x00GTW~|Cs=^\x7f\u07cd%o\rc wd\x13\\"\x16DruM8aYZGe\x1b\\\u077b\u06c7\x12#\x0c\x15\x04^G(<\x17\u0209\u071c"K\x16\x01\x14\u0600"^k\x14\x01\x19!L\x1b$Y\x00\nW'

💡
it's always worth attempting to decode and print your data, even if you know it's not a string. You might spot useful hints like file byte tags (MZ for windows executables, PKfor zip files), or recognisable strings within the data

Identifying Compression

If you're anything like me, for the sake of learning, you like to know how to do things the hard way, but for the sake of work, you want to know how to do things the easy way.

So for this next task, I'll show you both.

The easy way

Dissasemblers are a great tool for working out how your binary is operating. However, sometimes when you have a lump of garbled data, and you want to know what the deal is, the best thing you can do is throw it in Cyberchef (https://gchq.github.io/CyberChef/) an entirely browser based data transformation tool written in Javascript.

In Cyberchef we can decode our base64 just like we did in python, but after that is where the magic happens (yes it's aptly named the magic tool)

The magic function within Cyberchef uses a combination of signature matching and brute force to work out the series of operations required to decode your data
The magic function within Cyberchef uses a combination of signature matching and brute force to work out the series of operations required to decode your data

The magic detection in Cyberchef immediately detects "Zlib Deflate" which is a compression algorithm from the ZLib library. Whilst we can reverse this compression easily in Cyberchef by just clicking the magic wand, we want to continue creating our config extractor, so back into python we go.

💡
If cyberchef does not automatically detect the algorithm used, we can use the entropy operation to see if our data is likely to be encrypted or compressed.

The hard(ish) way

Sometimes tools like Cyberchef won't be able to tell us what our data is, or how to decode it, as you'll see later in this article. However, we can always turn to the one tool that we know can decode our config file, and that's the malware itself.

I had already marked up a lot of my functions and variables whilst working through the sample in IDA beforehand,  but even if I hadn't, debug messages and strings like these can make it a lot easier to follow the code
I had already marked up a lot of my functions and variables whilst working through the sample in IDA beforehand, but even if I hadn't, debug messages and strings like these can make it a lot easier to follow the code

We can go back into IDA and find our config loading function again, but this time, work our way though the pseudocode until we get to our base64 decoding functionality.

While it may seem like I did some complicated reverse engineering to identify my base64 decoding function(s), i'll point out now that due to this payload using cryptopp to perform this decoding, my IDA completely failed to decompile the functions (possibly due to cryptopp objects that must first be loaded into memory at runtime). I just went off of debug strings, and printed messages like DecodingLookupArray and Log2Base to get my bearings.

CAPA Explorer

However, if these hints had not been available to me I'd have turned to a useful tool by flare called CAPA Explorer mandiant/capa (github.com)

If you're familiar with the CAPA tool, you may know that it can identify capabilities of a binary based on signature matching from a large community sourced ruleset. It can detect things like anti-debugging features, memory injection, and encryption.

What you may not know, however, is that FLARE also released a plugin for IDA and Ghidra, that allows you to not only scan for those capabilities, but also locate them in decompiled code.

Here I use the capa explorer plugin to locate references to base64 functionality
Here I use the capa explorer plugin to locate references to base64 functionality
The reason CAPA detects this as possible base64 related data, is that base64 encoded text has a fixed set of letters, numbers, and special characters that can be used, and this is often stored in a variable for reference by decoding functions
The reason CAPA detects this as possible base64 related data, is that base64 encoded text has a fixed set of letters, numbers, and special characters that can be used, and this is often stored in a variable for reference by decoding functions

By using CAPA to locate a possible base64 function, and then viewing cross references to that variable, I identified the base64 decoder function shown in the section above, without having to look too deeply into the code.

💡
Other quick wins here can be finding functions that contain if variable1 == '=' statements, as base64 strings often end with '=' or '=='

Following the trail

Once we have identified our base64 decoder function, we can simply follow the output, from variable to variable, all the way to the next stage of decoding.

In this case we track our base64 output from a variable I named 'string_buffer'
image
to a the variable v31
image
to the variable 24
image
and finally to the variable (which I renamed myself) compressed_data
image
💡
Following code through can be a longwinded process, but why not make it easier by copying it out into your favourite code editor, for example I used visual studio code for a large portion of this

The next time we see this variable, it's being passed into an uncompress() function call as one of the parameters.

I renamed these parameters myself for clarity, based off of documentation of the zlib uncompress function found at
I renamed these parameters myself for clarity, based off of documentation of the zlib uncompress function found at uncompress (linuxbase.org)

A quick google will tell us that this uncompress function in linux binaries is likely to be a call to the zlib library's inflate function. We now know that this is the next step to decoding our config, and we will now recreate that in our python notebook.

ZLib Decompression

Now that we've identified our compression algorithm, we can recreate the decompression in our python script

To decompress this data with the zlib 'inflate' algorithm, we use the accompanying function in the zlib python library, decompress()

import zlib

decompressed = zlib.decompress(config_b64_decoded)
print(decompressed)
Again, I've cropped the output as nobody needs to scroll through a page full of byte code
Again, I've cropped the output as nobody needs to scroll through a page full of byte code

Reversing custom encryption

Once again, we've decoded our data but are still met with a mangled output. However, those with a keen eye might spot that the very beginning of our data looks a bit more readable than the rest.

mfBFDBtWeKgGajpP3hjuuK1tedsCdMl9 almost looks like a convincing key for some kind of encryption

I'd love to tell you that at this point I activated my remaining 6 brain cells and immediately identified the algorithm used, but instead I went into Cyberchef, added an XOR operation, and entered are key to see if it might work

image

To my frustration, this did not work... entirely. However, we can see that the first couple of lines do appear to have decoded successfully into some convincing json data. This tells us a few things:

  1. This is not a basic XOR algorithm
  2. XOR does play some role in the decryption
  3. mfBFDBtWeKgGajpP3hjuuK1tedsCdMl9 is definitely our base key of sorts

Back into our decompiler

To work out exactly how this final stage of decryption works, we're going to have to head back into our decompiler view and scroll down until we see our decompressed data used again in the code.

I renamed the parameters being passed into my uncompress() function from earlier on so I could easily identify the next time the output from decompression was referenced and I found it in the following loop

image

Immediately we can see a loop, references to our uncompressed data, and the number 32 used a lot, which just so happens to be the length of our key from earlier.

This loop is followed by a string::string constructor, which is a pretty good hint that our decoded json string is the output of this loop

Working out what this code does

By once again following the variables through, and working out what each conditional statement is actually checking (the if and while lines) we can slowly rename variables to what they actually are, and add comments to add clarity.

Once we do this enough, we will end up with a pretty good idea of how the code works, and a guide for recreating it in python.

I ended up with the following:

image

As you can see, I did not actually figure out what every part of this code does. I never worked out how v21 is set, but we can assume it's the length of our encoded data, as our do while statement stops when we get v21 bytes into the data.

However it's clear from our marked up code that the following is true:

  1. This is a rolling XOR algorithm, where the 1st byte of the key is used to encrypt the 1st byte of the data, 2nd byte of the key used to encrypt the 2nd byte of data, etc...
  2. If the data byte is equal to the key byte, that data byte is skipped over
  3. If the data byte is equal to 0, that data byte is skipped over

Recreating our algorithm in python

Now that we know how the final stage of decoding works, we can recreate it in python, applying the same logical steps as our payload does, but in a much simpler form.

Combining our two if statements and xor encryption inside of a loop that cycles through each byte of our data, results in the following code:

Hooray! Looks like we've decoded our json config

Final thoughts...

Now during incident response, this is the last thing I would think to do with my time, where fast paced triage, lightweight forensic collection and open source intelligence has become the defacto method for finding your bearings in high stress situations.

However, threat research like this feeds into the articles we read, the tools we use, and the future of the industry. And we can see from our end result that even a config file can yield some valuable insight into how adversaries operate, how they design their tools, and where their efforts are focused.

Sometimes, a bit of extra work can go a long way, and reverse engineering has, and always will, hold some value during, or after, an investigation.