One thing about this particular benchmark is that the file it produces is a litt...

josefx · on Nov 3, 2020

> the file it produces is a little bit unusual even for large text files, because it is only contains one (massive) line.

XML and JSON without pretty print. Minimized HTML with embedded resources. While not often gigabyte sized those seem to be enough to get most editors to hang and make editing the files a miserable experience.

OskarS · on Nov 3, 2020

Yes, true enough, it does happen occasionally, but it's pretty rare (and usually there are other ways to deal with it: using less or similar tools, or just running your massive single-line JSON thing through jq first). And my point was exactly what you're saying: many text editors choke on files like this, so it's not such a huge indictment of ox that it chokes on it as well.

However, ox clearly performs poorly even when your text files aren't massive single lines, so it doesn't really matter: ox basically can't handle files like this regardless of line distribution.

vram22 · on Nov 3, 2020

> and usually there are other ways to deal with it: using less or similar tools, or just running your massive single-line JSON thing through jq first

Older Unixen at least (SVR3-based) had a tool called bfs - big file scanner. Used it some, e.g. when, as a system engineer, I helped IT staff of one of our customers, a university processing exam results of tens of thousands of students from dozens of colleges affiliated to that univ. IIRC, its UI was something like a read-only ed (google "unix bfs command"). You used it to scan really large (for the time) files, e.g. data files (input/output) in large data processing environments, for purposes like checking if the input or output files anecdotally looked okay, no major noticeable garbage in them. Haven't checked if it is present in modern Linuxes. Also don't remember if it could handle large files without newlines. Likely not, if based on ed. Didn't have such files to work on then.

dilap · on Nov 3, 2020

As an emacs user, one of the things I really wish it had was robust support for arbitrary file sizes and shapes.

(It'd be fine if not all features worked on these files.)

Sure, most files are reasonably sized and shaped, but sometimes you do find yourself needing to e.g. dig into that single-line 1GB json file to try to debug something, and it's inefficient to have to think, before opening any file, "is more normal tool sufficient for this, or should I use something else?"

There's no fundamental reason we can't have one tool that does it all!

barrkel · on Nov 3, 2020

emacs -q or -Q is often quite fast. It's things like syntax highlighting, word wrapping, etc. which modes add which usually slow it down. Anything trying to run regexes over the buffer isn't going to scale well.

vcxy · on Nov 3, 2020

I think long lines are (were? [1] seems to be a new feature) a well known problem in emacs even without a custom config. There have been ways to work around this for awhile I think, but as far as I know emacs -Q isn't one of them.

[1] https://www.gnu.org/software/emacs/manual/html_node/emacs/Lo...

dilap · on Nov 3, 2020

oh that's neat, i didn't know about that! could help.

my ideal editor would never block, and use file-sized based approximations to allow scrolling to arbitrary locations w/o having to actually read the contents of the entire. things like syntax highlighting &c would be either strictly time-limited to not drop frames, or done in the background.

teddyh · on Nov 3, 2020

Try M-x find-file-literally to avoid any mode-specific hooks, colorization, etc.

zoltar · on Nov 3, 2020

Incorrect. The base64 command wraps at 76 columns by default.

oefrha · on Nov 3, 2020

Whether there's a wrapping feature depends on the implementation. coreutils base64, yes. Other implementations, maybe not. macOS's /usr/bin/base64 for instance doesn't have the concept of wrapping. FreeBSD base64, judging from the manpage, doesn't do that either. https://www.freebsd.org/cgi/man.cgi?query=base64

Edit: Oops, posted some misinformation. The mistaken part, reproduced below:

> First, wrapping by default only occurs when writing to a tty. When stdout is redirected to a file, wrapping base64 output usually doesn't make sense, so wrapping doesn't happen.

Correction: coreutils base64 wraps regardless of whether stdout is a tty.

tzs · on Nov 3, 2020

> macOS's /usr/bin/base64 for instance doesn't have the concept of wrapping

Yes it does. "-b N" or "--break=N" inserts a line break every N characters.

oefrha · on Nov 3, 2020

Yeah. I know -w doesn't work, and missed -b when glancing at the manpage. (I use coreutils day to day.)

OskarS · on Nov 3, 2020

I ran it on macOS, and there I got:

    $ base64 /dev/urandom | head -c 1000000000 | wc
       0       1 1000000000

I haven't tried anywhere else, so it might be different on Linux or GNU or whatever.

zelphirkalt · on Nov 3, 2020

Could this be summarized as "algorithm" (or data structure more specifically) beats language implementation performance?

OskarS · on Nov 3, 2020

Certainly could!