Mani Bharathi

Malware Analysis

Home
Malware Analysis

Disclaimer: These notes are my personal summaries and interpretations of various online courses. They are not direct transcriptions and reflect my own understanding.

1. Manual Malware Analysis

1.2 Static Analysis

We are usually searching for:

How it works.
How to identify it.
How to defeat it or eliminate it.
This also includes finding Indicators of Compromise (IoC); we want to look for network or host-based indicators, including:

Anything unique about a file like hash, size, names.
Binary characteristics: strings, PDB paths.
Changes made to the OS.

1.1.1 Host-Based Indicators

Malware usually wants to persist on the system, whether by changing the registry keys or downloading something else. Filenames and Paths can be good host-based indicators. Something like %APPDATA% is a Windows path variable; a good indicator is something unique, if it is something present in a lot of samples then it's not a good indicator. Malware usually uses registry keys, especially to make persistence and make the program always run when the computer is booted.

HKEY_CURRENT_USER/Microsoft/Windows/Currentversion/Run and HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services are very often used to establish persistence keys. Mutexes are used to lock resources from the computer so the malware does not interfere with itself.

1.1.2 Network-Based Indicators

The main thing here is the C2 server where the malware is probably connecting to, usually: IPs, protocols and ports, HTTP headers (cookies or user-agents), or even some signatures. It is important to be able to distinguish the URL parts (scheme, domain, path, and query). The user-agent usually gives us enough info, like browser type, version, OS, and architecture.

1.1.3 Basic Analysis

We need to extract things without executing it:

Hashing: This is very good because the smallest change will change the bits of the program, the core of a hash algorithm is extremely difficult, the most secure one today is SHA-256, but there is also SHA-128 and MD5, there are many hashing tools.
Strings: When we compile the program, the hard-burn strings will survive and be compiled as well. From here, we could find filenames, registry paths or keys, HTTP user-agents, PBD Strings. This can also be shown in HEX ASCII format. Usually, C-Convention languages end lines with 0x00. If we use UNICODE, the convention is different because Microsoft standard is UTF-16 because each character is two bytes; this uses little-endian, so 0x0048 == 48 00. One of the main tools is strings, which is available for both Windows and Linux. We need to distinguish between compiler strings and real important strings.
Encoding: Is converting the shape of the data. Malware can be encrypted, obfuscated, or encoded; they are usually hexadecimal, XOR, or Base64.
- Hexadecimal: This is useful to get data in readable format.
- Base64: Data is represented using 64 printable characters, character '=' or '==' is usually used to make the chunks match the 3 chunks used so they can be decoded. JavaScript and PowerShell usually use this; CyberChef can be used to decode this.
- XOR: Is a binary operation, so here what we do is modify the data. The table is basically "every place the bits disagree is a 1"; this can also be thought of as Addition mod 2 (1+1 % 2). So here we use a key to encode and decode data; some properties are: (X^00=X; X^X=00).
- Tools:
  - CyberChef
  - Floss (Flare Obfuscated String Solver): allows us to get the strings that strings don't get.
  - 010 Editor: view and edit raw hex/ASCII.
Open-Source Intelligence: We should be careful and do not upload malware samples to VT since threat actors can know what malware we have; instead always use MD5 or hashes. Google is also a very good source, using Google Dorks we can find important info about source code, unique strings, hashes, and malware families. Use this with care because this can be very misleading.
PE File Format: Portable Executable Format is the standard Windows executable. Windows have made the program backwards compatible; we have:
- .exe: An executable program that once executed it becomes its own process, with its own chunk of separated memory. Most malware runs in user mode, not kernel mode. They have:
  - Headers: Tells the OS how to deal with this file, like where is the entry point, what DLL dependencies are needed, how the sections should be arranged in the memory (section headers), and how the functionality of this app is exposed to other apps (exports). We have the DOS Header and the Rich Header that is automatically added by the compiler.
  - Sections
- .dll: Dynamic link library, provides facilities that can be used by other programs; they can be loaded and unloaded; these offer malware much flexibility to deploy malware. Three types of linking: static (the file has the dependencies), load-time, and run-time.
- .sys: Kernel drivers that executed outside of the OS.
Packing: It is mainly used for malware to avoid static malware analysis; one of the ways to identify something is packed is if we use strings and get nothing or get encrypted stuff. Tools are: PEiD, DIE (one of the best ones), CFF Explorer.
Unpacking: Take the code and get it; the tools can be: CFF Explorer, upx command line tool, capa (it disassembles the code and makes an automatic analysis).

1.2 Dynamic Analysis

1.2.1 Malware Sandboxes

We take the malware and execute it in an environment that simulates the necessary; like Joe Sandbox, Cuckoo, VMRay, Hybrid Analysis; however, this only captures a subset of the available code paths, and usually, some malware detects when they are running in a sandbox; and we cannot support all types of files as well as not having the complete picture of what is happening. So we can then create our own environment to run the malware, like FlareVM.

1.2.2 Control Environment

Disable shared folders, if we really need them then set them to read-only.
The network adapters are set to Host-Only.
Disable any Unity integration features.
Reset the VM to a clean snapshot of before analyzing the file or executing it again.

1.2.3 Tips

Avoid storing raw malware on the host.
Use password compression to store the malware as a zip file.
Avoid having the .exe extension in the name of the malware.

1.2.4 Tools

System Internal Monitoring:
- Process Explorer (procexp.exe): versatile task manager, allows us to see the strings on memory.
- Process Monitor (procmon.exe): monitors file systems, process, and some network events in real-time; we can set filters to manage the output and not see the noise data. Good filters are: ProcessCreate, WriteFile, RegSetValue, and SetDispositionInformationFile; we can even set a custom set of filters to have on hand.
Network Monitoring:
- FakeNet-NG: simulates protocol services, process handling, and filtering, it can be very configurable; and also generates a .pcap file of the traffic captured.
- Wireshark

1.2.5 Launching Binaries

EXEs: We would like to execute in an admin command prompt to allow it to do everything it wants, and also allow us to see any information that may print to the console of debugging messages.
DLLs: This is shared code and is going to load and then execute, but as we cannot execute we can force the execution with rundll32.exe <DLL_Name>[, <DLL_Export>].
Service DLL: FlareVM does not have built-in tools to avoid the malware VM detection techniques.

1.2.6 Dumping Memory

Let the malware do the work (unpack the code or decode the strings) and then we will dump the data from the memory. We can use: Process Dump, that will take a process in memory and dump it to the disk so we can do static analysis; we can run a packed sample, suspend the process, dump the memory, and analyze the unpacked sample. The usage of this tool is: <pd32.exe | pd64.exe> -pid <pid> or <pd32.exe | pd64.exe> -p <process name>.

Some advanced tricks:

Dump the process as it exists (program that exists immediately): pd64.exe -closemon.
Dump any unrecognized module:
- pd64.exe -db genquick: generates a whitelist of modules running.
- pd64.exe -system: dump all modules not matching the generated whitelist from the previous step.

1.3 Windows Management Technologies

1.3.1 .NET Framework

Common Language Runtime is what allows the C# code to run in multiple systems. Some of the static tools we can use are: CFF Explorer, dnSpy (most important), de4dot, P/Invoke and Reflection. The framework has two main components: execution engine and large class library. The Common Language Infrastructure (CLI) is the library of way too many code already written that can be used by malware developers, this is one of the main reasons for attackers to use C#. These programs have metadata that we can see in the CFF Explorer.

Now here we have the MetaData Streams, in which we can find User Strings that are the strings defined by the programmer. Also we have the Strings that are related to compilation and method names but not the user defined strings. We also have the .NET Header which has an entry point token which is where the code starts, the first method to be executed. Also the Metadata Tokens that contains information about the code functions; usually attackers know about this so they try to obfuscate the code and else.

DnSpy is capable of taking the code and reversing it back to source code; it is a debugger, decompiler and disassembler. We can set breakpoints, single step, inspect and modify variables and set raw values. It can be very helpful; this tool is open-source.

Obfuscation of these malware is very common so we can use tools like de4dot; it can do things like member renaming, string decryption, control flow deobfuscation, and dead code removal. It takes the encrypted part, decrypts it and then puts it back in the code to analyze replacing the obfuscated parts. The usage of de4dot is: de4dot.exe <programName> -o <outputFileName> --strtyp <optionalIndicatedDecryptorType> delegate --strtok <optionalTokenMethodFound>.

1.3.2 In-Memory Loading

Allows a code to load libraries and modules during runtime and not at compilation; this is one reason why C# is used for malware development.

1.3.3 Windows Management Instrumentation

It is used for local and remote system administration; the problem is that it is very easy from C# to access these functions. This has many different acronyms so we must be careful not to overlook it. Since this is something very big we can find info in the Windows Developer Help Pages online. WMI supports some type of query language similar to SQL which is WQL.

1.3.4 PowerShell

We can use powershell_ise.exe as a tool to debug and work with obfuscated scripts from Windows. As execution policy we have: Unrestricted, Restricted, and AllSigned; PowerShell is capable of bypassing execution policies and it is very powerful. These scripts can also be downloaded and run without putting them in memory. By using '|' we can use the output of one command and send them to a following command. It is more like a programming language than an interpreter because we can do much more like programming languages like defining variables, using functions as objects, passing parameters and much more.

PowerShell and .Net work very well together since PowerShell makes a lot of steps in order to always find modules, libraries and codes that we are trying to implement from .NET. There are many ways to download and execute malware from PowerShell, it can even be done with one code line.

1.4 Advanced Static Analysis

We have different levels of analysis and this part is mainly made in C/C++. Decompilators are very good because they get us halfway to the source code and not only in the assembly code, but usually make some mistakes from time to time.

1.4.1 GHIDRA

GHIDRA is both a decompiler and disassembler; this is an open-source app, an alternative is IDA but this is a paid application. Two clues of a main function are that it is the last function before an exit point and that before a main function the three main parameters are pushed (argc, argv**). One of the most important things that we could and should do with GHIDRA is rename parts of the code, like when we found the main then we should rename it as "main". Another thing we can do is that when we find something interesting we can see where it is referenced in the code. We also do operations on the data that is shown from memory and much more. As soon as we find what something does or what something is, rename it. We can also try to find matches for registry keys (using 'e').

1.4.2 API Analysis

This API is not from the network but the API from the OS that makes special system calls, like writing a file.

1.4.3 Network Analysis

ws2_32.dll is the Windows Sockets which is the basics of networking; wininet.dll is for networking at a higher level by handling HTTP requests and responses and else.