Yesterday I released the first version of node-capstone, a node module that provides bindings for the Capstone disassembler library. It's the first piece of a debugger project that I recently restarted. It's a long story, but the project has been in development for about 5 years. These are its latest developments.
Hacking like a SamuraiFirst, I should probably establish my credentials in this area. I've written debuggers for five different architectures. Two of those were released publicly; FCEUd and GCNrd. The other three targeted Nintendo DS, Nintendo 64, and Nintendo Virtual Boy.
There are a few patterns to be spotted here! Obviously, they are all for Nintendo game systems. But more importantly, the debuggers interact with and analyze assembly code. Games for the most recent of these machines (NDS, GCN, and N64) were written in C and C++, but of course we don't have the luxury of inspecting any of this source code. The next best thing is analysis of the machine code itself; transform the machine code into human-readable assembly language, and begin studying!
Source Level vs Machine LevelSo let's talk about source level debuggers and how they differ from machine level debuggers. A source level debugger expects the original source code and debug info from the compiler that maps machine code addresses to the source code. Some of these source level debuggers (gdb, et al.) will disassemble machine code if there is no source (or if you ask it to), but these tools are not designed for static analysis. They can give you a murky picture of the machine code at best.
A machine level debugger is the ultimate super king of static analysis. There is no "source of truth" to make sense of the blob of binary being analyzed. Some executable formats provide hints about code structure, like which segments contain code and which segments are only for data. Then there is the issue of determining which bytes in the code segments are instructions (this is complicated by some architectures using variable-sized instructions. x86 is one such architecture). Finally, it must determine the overall structure of the code; which groups of instructions comprise a complete function, flow control, how functions refer to one another, how they refer to data, pattern matching and heuristics to identify common library functions, etc.
With static analysis you can go even one step further, and determine how sets of instructions can be "decompiled" into a high-level language. A simple example is a load instruction followed by an arithmetic instruction that adds 5, and a store instruction back to the same address; this set could be equivalent to a variable assignment like the following in C:
This method of analysis is not necessary in a source level debugger. But it becomes important with fairly complex code in a machine level debugger. The above snippet of C is immediately recognizable, compared to the sample assembly below:
It is this sophisticated analysis that I want my debugger to do. I have a long, long way to go before it gets to that point.
The ProjectI've explained enough to describe my intention; building a machine level debugger with awesome analysis tools. But I haven't really touched on internal design or even specific architecture targets or languages. And with good reason! This debugger is taking an architecture-independent approach. Meaning it should be able to do a decent job debugging anything that it can disassemble. But it can do the best job with architectures that it can analyze automatically.
To achieve this abstraction, the debugger needs a solid core with all architectures supported by plugins or modules. Even the tracing of running processes will be written with an abstraction layer; a protocol I'm designing called Scalable Remote Debugger Protocol. SRDP provides a generic interface for transporting raw data and issuing primitive debugger commands like "add breakpoint". The protocol also supports a notification mechanism for event handling (like a breakpoint being triggered).
SRDPThe protocol is still in the design phase, having undergone many revisions already. The latest developments in protocol land have been separating the data blocks from the data streams. CPU memory is an example of a block of data. Things like commands and event notifications are data streams. Conceptually all three can use the same transport, but the endpoints will be a bit different:
- Data block endpoints are encapsulated as a mountable block storage device, similar to a USB flash drive or SD card. This allows interacting with a machine using native OS utilities; want to read raw memory from a specific address? Open and read the file, just like you are already used to doing!
- Commands and events are encapsulated in a data stream interface using sockets, which itself uses the same open/read/write/close API as the file system.
Putting the data blocks into a mountable storage interface was an idea from qeed, who is working on something similar for his GameBoy emulator.
SRDP gives us a way to connect debugger targets to debugger UIs. A UI could provide a text mode interface like gdb, or a graphical interface like IDA Pro.
As early as '10, the only real web stack that could be used outside of a browser was Mozilla's framework, called XULRunner. I've written a few samples with it  , and was impressed immediately. But at the time, the technology was not mature enough to support development of a full-featured application by just one man.
XULRunner is still alive and kicking today, in the same capacity as always; it powers Firefox and similar applications. There are three reasons I did not stick with it as my platform of choice:
- Debugging. Native debugger UIs in XULRunner were non-existent at the time. It was possible to install a version of Firebug which worked with XULRunner, but it just wasn't the same. This situation may also have been alleviated, now that Firefox Browser Debugger is available.
The problem with node.js is that native modules must be written in C++ (if you can't tell, I have a major aversion to this language). But then I came across the ffi module, which can be compared to js-ctypes. Now it's perfect!
I have not yet decided on how I want to render the disassembler or hex editor (two widgets that are required in any machine level debugger), but I know exactly how I'm going to implement the disassembler!
Capstone is written in C, based on LLVM, and supports five architectures: x86 (including X86_64), ARM, ARM64, MIPS, and PowerPC, with two more coming in the next version (SystemZ and Sparc). These are all interesting to me (especially x86_64, which is a disassembler I do NOT want to write myself), and the library should be straight-forward to extend with other architectures that I want.
Welp! That was easy! :D
In all seriousness, I spent a weekend getting this working. Here's the actual output:
0x1000: push rbp 0x1001: mov rax, qword ptr [rip + 0x13b8]
It is written as a node module using ffi, and has been released on github under the terms of the MIT license. This will provide the "solid core" that I mentioned earlier. Hook the disassembler up with some static analysis tools, an awesome renderer/editor, and SRDP; a modular debugger foundation that runs everywhere and debugs everything!
It's still just a tool
While that sounds quite glorified and romantic, it must be said that even the one-debugger-to-rule-them-all won't make you the best hacker in the world. It takes years of practice to use such a tool effectively. And it takes many more years just to create the tool, and to refine it to perfection.
The debugger is my katana.