Exploring Object File Formats

xvilka · on Jan 16, 2024

Always quality content in that blog. We used MaskRay's article[1] on stack unwinding to improve our debuginfo (DWARF) support[2] in the past. If someone wants to have a more hands-on approach to executable file formats, e.g., XCOFF or GOFF, they can check Rizin's ideas for new formats[3] to support.

[1] https://maskray.me/blog/2020-11-08-stack-unwinding

[2] https://rizin.re/posts/gsoc-2023-dwarf/

[3] https://github.com/rizinorg/ideas/issues?q=is%3Aissue+is%3Ao...

khaledh · on Jan 16, 2024

Low-level programming is one of my favourite subjects. I've written a simple ELF parser in Nim (with the help of an amazing binary parsing library) among other things: https://github.com/khaledh/elfdump

WalterBright · on Jan 16, 2024

In order to learn an object file format, the first thing I'd do is write a dumper for it.

mhh__ · on Jan 17, 2024

Writing a program that can dump itself is a great learning project.

Good bit twiddling practice too

eddd-ddde · on Jan 16, 2024

That reminds me, couple of years ago when I was leaning to program, one of my first projects was a ELF parser so I could understand what binaries where doing and how they where built.

a2code · on Jan 16, 2024

How did you learn to do this?

khaledh · on Jan 17, 2024

My reference was the "System V Application Binary Interface" itself (chapter 4). It's not that difficult to grok.

https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf

hnthrowaway0328 · on Jan 16, 2024

Just read the ELF specification. There is a header and everything else follows. There are a lot of explanations online, here is one: https://www.cs.cmu.edu/afs/cs/academic/class/15213-f00/docs/...

eterps · on Jan 16, 2024

I have a soft spot for the brutal simplicity of the .COM format.

WalterBright · on Jan 16, 2024

What's interesting about that is every tool I've seen to generate .com files would set SS=DS=ES=CS to the same value.

But, it turns out, this is not necessary. The only thing about COM is that the file size has to be less than 64K. This enables another memory model, where the code seg and data seg are different, enabling substantially larger programs that were still COM programs. This was the Zortech "small" memory model.

actionfromafar · on Jan 16, 2024

Also see the pure brutality that is the early Microsoft Office DOC file. :) Just dumping the memory IIRC.

fullspectrumdev · on Jan 16, 2024

I vaguely recall reading that on some versions of Office way back when, memory from other programs would sometimes leak into saved document files.

(Pauses commenting to go check) yes it’s referenced in Chapter 3 of “Silence on the Wire”, apparently Office on Windows 95/98 would dump memory from “other programs” into word docs, with some anecdotal sightings in later versions.

actionfromafar · on Jan 16, 2024

Maybe a case of reused allocated memory and just writing chunks of memory to disk. Simplistic example:

struct blah_block{ int sz; char buf[510]; }

aebtebeten · on Jan 16, 2024

Both ELF and Mach-O (and presumably other recent formats) are amenable to a pseudo-COM approach: https://news.ycombinator.com/item?id=38593896

boricj · on Jan 16, 2024

I've written an ELF object file exporter as part of a Ghidra extension [1]. It's a bit finicky to get it right (toolchains assume that object files are valid and don't have much in the way of diagnostics), but these are fairly simple under the hood. Section bytes, symbols and relocations, with some headers and metadata to wrap these up...

It's a bit of a shame that object files aren't more of a lingua franca of toolchains in practice. Embedding binary blobs inside a program in a portable way is still a mess today.

[1] https://github.com/boricj/ghidra-delinker-extension/tree/mas...

norir · on Jan 16, 2024

Nice overview. They didn't get to the hideous self-referential (and largely undocumented) trie that is used for the symbol name mappings in mach-o. Not fun to implement. And frustrating because there is so much wasted space in typical mach-o binaries that it seems very much not worth the compression effort, at least by I don't know 2005?

theresistor · on Jan 18, 2024

In the early days of Swift, binary size bloat due to huge symbol tables was a serious problem. We had some real binaries where the symbol table was more than half of the entire file.

snvzz · on Jan 16, 2024

Notably missing is HUNK, AmigaOS's object file format.

pjmlp · on Jan 16, 2024

Nice to see an article that remembers AIX isn't about ELF.

Symbian also used a COFF variant.