Always quality content in that blog. We used MaskRay's article[1] on stack unwinding to improve our debuginfo (DWARF) support[2] in the past. If someone wants to have a more hands-on approach to executable file formats, e.g., XCOFF or GOFF, they can check Rizin's ideas for new formats[3] to support.
Low-level programming is one of my favourite subjects. I've written a simple ELF parser in Nim (with the help of an amazing binary parsing library) among other things: https://github.com/khaledh/elfdump
That reminds me, couple of years ago when I was leaning to program, one of my first projects was a ELF parser so I could understand what binaries where doing and how they where built.
What's interesting about that is every tool I've seen to generate .com files would set SS=DS=ES=CS to the same value.
But, it turns out, this is not necessary. The only thing about COM is that the file size has to be less than 64K. This enables another memory model, where the code seg and data seg are different, enabling substantially larger programs that were still COM programs. This was the Zortech "small" memory model.
I vaguely recall reading that on some versions of Office way back when, memory from other programs would sometimes leak into saved document files.
(Pauses commenting to go check) yes it’s referenced in Chapter 3 of “Silence on the Wire”, apparently Office on Windows 95/98 would dump memory from “other programs” into word docs, with some anecdotal sightings in later versions.
I've written an ELF object file exporter as part of a Ghidra
extension [1]. It's a bit finicky to get it right (toolchains assume that object files are valid and don't have much in the way of diagnostics), but these are fairly simple under the hood. Section bytes, symbols and relocations, with some headers and metadata to wrap these up...
It's a bit of a shame that object files aren't more of a lingua franca of toolchains in practice. Embedding binary blobs inside a program in a portable way is still a mess today.
Nice overview. They didn't get to the hideous self-referential (and largely undocumented) trie that is used for the symbol name mappings in mach-o. Not fun to implement. And frustrating because there is so much wasted space in typical mach-o binaries that it seems very much not worth the compression effort, at least by I don't know 2005?
In the early days of Swift, binary size bloat due to huge symbol tables was a serious problem. We had some real binaries where the symbol table was more than half of the entire file.
[1] https://maskray.me/blog/2020-11-08-stack-unwinding
[2] https://rizin.re/posts/gsoc-2023-dwarf/
[3] https://github.com/rizinorg/ideas/issues?q=is%3Aissue+is%3Ao...