Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Exploring Object File Formats (maskray.me)
85 points by ingve on Jan 16, 2024 | hide | past | favorite | 19 comments


Always quality content in that blog. We used MaskRay's article[1] on stack unwinding to improve our debuginfo (DWARF) support[2] in the past. If someone wants to have a more hands-on approach to executable file formats, e.g., XCOFF or GOFF, they can check Rizin's ideas for new formats[3] to support.

[1] https://maskray.me/blog/2020-11-08-stack-unwinding

[2] https://rizin.re/posts/gsoc-2023-dwarf/

[3] https://github.com/rizinorg/ideas/issues?q=is%3Aissue+is%3Ao...


Low-level programming is one of my favourite subjects. I've written a simple ELF parser in Nim (with the help of an amazing binary parsing library) among other things: https://github.com/khaledh/elfdump


In order to learn an object file format, the first thing I'd do is write a dumper for it.


Writing a program that can dump itself is a great learning project.

Good bit twiddling practice too


That reminds me, couple of years ago when I was leaning to program, one of my first projects was a ELF parser so I could understand what binaries where doing and how they where built.


How did you learn to do this?


My reference was the "System V Application Binary Interface" itself (chapter 4). It's not that difficult to grok.

https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf


Just read the ELF specification. There is a header and everything else follows. There are a lot of explanations online, here is one: https://www.cs.cmu.edu/afs/cs/academic/class/15213-f00/docs/...


I have a soft spot for the brutal simplicity of the .COM format.


What's interesting about that is every tool I've seen to generate .com files would set SS=DS=ES=CS to the same value.

But, it turns out, this is not necessary. The only thing about COM is that the file size has to be less than 64K. This enables another memory model, where the code seg and data seg are different, enabling substantially larger programs that were still COM programs. This was the Zortech "small" memory model.


Also see the pure brutality that is the early Microsoft Office DOC file. :) Just dumping the memory IIRC.


I vaguely recall reading that on some versions of Office way back when, memory from other programs would sometimes leak into saved document files.

(Pauses commenting to go check) yes it’s referenced in Chapter 3 of “Silence on the Wire”, apparently Office on Windows 95/98 would dump memory from “other programs” into word docs, with some anecdotal sightings in later versions.


Maybe a case of reused allocated memory and just writing chunks of memory to disk. Simplistic example:

struct blah_block{ int sz; char buf[510]; }


Both ELF and Mach-O (and presumably other recent formats) are amenable to a pseudo-COM approach: https://news.ycombinator.com/item?id=38593896


I've written an ELF object file exporter as part of a Ghidra extension [1]. It's a bit finicky to get it right (toolchains assume that object files are valid and don't have much in the way of diagnostics), but these are fairly simple under the hood. Section bytes, symbols and relocations, with some headers and metadata to wrap these up...

It's a bit of a shame that object files aren't more of a lingua franca of toolchains in practice. Embedding binary blobs inside a program in a portable way is still a mess today.

[1] https://github.com/boricj/ghidra-delinker-extension/tree/mas...


Nice overview. They didn't get to the hideous self-referential (and largely undocumented) trie that is used for the symbol name mappings in mach-o. Not fun to implement. And frustrating because there is so much wasted space in typical mach-o binaries that it seems very much not worth the compression effort, at least by I don't know 2005?


In the early days of Swift, binary size bloat due to huge symbol tables was a serious problem. We had some real binaries where the symbol table was more than half of the entire file.


Notably missing is HUNK, AmigaOS's object file format.


Nice to see an article that remembers AIX isn't about ELF.

Symbian also used a COFF variant.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: