Binaries in bytes

bigot · 04-02-2025, 07:45 PM

Binaries in bytes: cut shellcode and load into memory

In the last article, I discussed how styler developers disguise their creations with cryptocurrency, leaving antiviruses with their noses in their mouths. One of their favorite tricks is the use of a stub loader, which runs the stealer directly from RAM without leaving any traces on disk. It takes encrypted code, unpacks it in RAM and starts execution while static analysis beats its head against the wall without finding any traces. But where to get this code for the bootloader? And how to launch not just a scrap of instructions but a full-fledged program without touching the hard disk for a second? Today I'll show you how to create shellcode in assembly language from scratch, cut it from ready-made files using utilities like Donut or manually via IDA Pro, and use the RunPE technique to make the binary come to life in memory without a single hint of its presence. Let's walk through each method step by step, from basic fundamentals to rigorous practice.

The role of the bootloader and its tasks

A loader (stub) is a small but very clever middleman that delivers and drives payloads directly into the system's memory. On disk, it looks like a modest file, often pretending to be an innocuous application so as not to attract attention. Its main task is to decrypt the hidden code (be it shellcode or an entire binary) and pass control to it in RAM, bypassing hard disk storage and leaving no trace of it. In last week's article, I mentioned how such boot loaders arm themselves with encryption like AES or cleverly generate keys from system data - like the disk serial number - to confuse analysis and make life difficult for defenders. But without the code it will execute, the bootloader is an empty shell, useless without the stuffing. This code can be a shellcode - a set of compact instructions for the processor - or a full-fledged .exe run through RunPE. Let's analyze all available options.

1. Creating shellcode in assembly language

Shellcode can be assembled manually in assembly language - it's like a jigsaw puzzle where every byte must fit perfectly, otherwise the whole structure will crumble into dust. The process requires jewelry precision, because the developer writes commands for the processor to perform a specific task - for example, to call a system utility. The result is a byte array, which the loader places in a specially allocated memory area with execution rights and runs as a function, as if firing a slingshot.

This method gives you full control over the contents, but it has its pitfalls. The addresses of system functions (say, from kernel32.dll) have to be written manually, and they change from one version of Windows to another. Complex tasks - networking, data encryption or password stealing - turn into a real mountain of code lines, where any mistake becomes fatal. On top of that, system updates and mechanisms like ASLR (address space randomization) can turn such shellcode into useless garbage overnight. That's why manual creation is good only for simple operations or rare cases where every byte counts and size is critical.

2. Generating shellcode with Msfvenom

If you don't have time or desire to mess around with assembler, the msfvenom utility from the Metasploit Framework helps by creating shellcode on the fly without effort. You set the parameters - type of load (for example, a reverse shell to communicate with the attacker), IP address and port for connection - and you get a ready byte array, clean and neat. The loader takes it, allocates memory, copies the bytes and runs it without any extra effort.

The pros are obvious - speed and convenience. In seconds you have optimized code that can even bypass basic antivirus checks. The downside is that it's a template, not a unique development for your needs. If you need a styler with special functions - for example, stealing data from a specific application - msfvenom is powerless here. Besides, popular pailoads eventually settle in antivirus databases, losing their invisibility.

3. Extracting shellcode with Donut

To work with a ready .exe, there is a Donut utility - it parses the file structure - code sections, import tables, relocation tables - and produces an array including the main code and a mini-loader to handle dependencies (for example, calls to DLLs like user32.dll). The loader places this array in memory and runs it once everything is ready.

The process is fully automated and works with any .exe without the need to manually dig into its innards. But there is a nuance: because of the added mini-loader and dependencies, the shellcode swells like dough on yeast, which may make antiviruses wary. Nevertheless, Donut remains an effective way to turn an existing program into code ready to execute in memory.

4. Manual extraction of shellcode using IDA

When automation like Donut doesn't give you the precision you need, you can cut shellcode out of the .exe yourself, armed with IDA Pro disassembler and a couple of other tools - it's like archaeology, where you dig out a valuable artifact from a pile of bytes. You open the .exe in IDA, find an entry point or key function (for example, the start of a payload like encryption or a network request), and analyze the execution flow, step by step, to understand what the code is doing. Then you select the desired section of machine code - from the first instruction to the exit point - and copy it as raw bytes.

But that's just the beginning. We need to make sure that shellcode is self-sufficient: if it pulls dependencies (for example, calls to LoadLibrary or GetProcAddress to work with DLLs), their addresses must be embedded or replaced by dynamic lookups, otherwise ASLR will turn them into garbage. To do this, you have to edit bytes in a hex editor or rebuild the piece in assembler, adding a wrapper that will find the necessary functions in the victim's memory by itself. The result is a byte array that the loader throws into memory and launches like a well-honed weapon.

The pros of this method - you are the king and god of your shellcode, you cut exactly what you need, without unnecessary garbage and unnecessary wrappers. On the downside, it's a hell of a lot of work that requires time and attention: parsing .exe, fixing relocations, handling dependencies and making sure that the logic doesn't break halfway through. This method is for those who are ready to sweat for the sake of an ideal result and do not trust ready-made utilities like Donut.

5. Running the binary via RunPE

When shellcode is too cramped for the task at hand - for example, you need an .exe with a graphical interface or complex logic - RunPE comes into play. This is a technique that allows you to run an entire executable in memory without touching the disk for a moment. RunPE stands for "Run Portable Executable" - literally "run a portable executable". The binary is embedded in the program and comes to life through a legitimate process, hiding its traces like a ghost in the system.

What is RunPE and how does it work?

RunPE is an emulation of standard Windows PE file booting, but with a tricky and dodgy twist. Normally, the OS reads a PE file from disk, allocates memory, lays out the code and data sections, connects the imports, and starts from the entry point. RunPE does the same thing, but manually, in the memory of another process, leaving the file system no chance to interfere.

The process includes the following steps:

- Embedding a binary into a program: The executable file is converted into a byte array and included in the program code as a variable. This bypasses the need to store the file on disk.

- Creating a suspended process: Starts a legitimate application (e.g. notepad.exe) in a suspended state. This process serves as a shell to execute the target binary.

- Memory allocation: In the address space of a suspended process, an area is allocated that is sufficient to hold the entire .exe image, including its headers and sections (code, data, resources).

- Copying the PE structure: Headers and sections of the binary are transferred to the allocated memory, taking into account their virtual addresses as specified in the PE structure. This ensures that the program will work correctly even if it uses relative references.

- Address Of Entry Point: The main thread context of the suspended process is changed to point to the entry point of the target binary. This redirects execution from the source code (e.g. notepad.exe) to the embedded .exe.

- Process resumption: The thread is started and the binary starts running, masquerading as a legitimate application.

Why do I need RunPE?

RunPE is a way to run a complex .exe - with an interface, dynamic libraries (DLLs), and any kind of stuffing - without the slightest trace on disk. Unlike shellcode, which is limited to a compact set of instructions, RunPE takes the whole program without breaking its functionality. This is ideal for delivering full-blown stylers or malware with input windows, tricky network logic, or heavy dependencies that must run unchanged.

Benefits:

Runs the .exe without the slightest edits, with all dependencies and resources.

It does not require parsing or converting code to shellcode, which makes it easier to work with ready-made programs.

Hides behind a legitimate process, reducing suspicion on cursory analysis.

Disadvantages:

The implementation is more complex than in the case of shellcode because of memory and process manipulations that require precision.

Uses more system calls (memory allocation, writing to someone else's process), which can get under the radar of antiviruses with behavioral analysis. The size of the array grows with the .exe, making it harder to mask and more noticeable.

6. Increasing the complexity of detection through polymorphism

Polymorphism can be used to reduce the probability of detection. For shellcode, utilities like msfvenom offer encoders that change the structure of the code each time it is run, but retain its functionality. For RunPE, the binary is encrypted with a simple algorithm (e.g. XOR) and decrypted before startup to make each instance look different. This way signature analysis gets stuck in the sand - every time there is a new look, and not a single clue.

Conclusion

The loader is the starting point for executing code in memory. Shellcode can be assembled in assembler for full control, generated via msfvenom for speed, cut manually with IDA for accuracy, extracted from Donut for custom .exe or run a whole binary via RunPE for complex tasks. All methods lead to the same goal - executing the malware in RAM without a single file on disk. RunPE stands out by allowing you to run full-fledged .exe's while preserving their features and masking, making it especially useful for delivering complex malware. The choice depends on the task: simplicity, compactness, or functionality. Antiviruses stay a step behind while the developer customizes the approach to suit their needs.

argue · 04-07-2025, 12:59 AM

Is it possible to start posting your content on Geocities, it would be more beneficial.