On my machine, when I last tried the various accelerated terminal emulators, I w...

killerstorm · on March 13, 2023

A GPU-accelerated terminal emulator sounds like a nuclear-powered kitchen mixer to me.

Like, why? Over 20 years of using terminal emulators not even single time I was like "Man, I wish my terminal was faster".

Is this just a fun project to do, like, "yay, I wrote a GPU-accelerated terminal emulator!"?

imran-iq · on March 13, 2023

It depends on your workflow and on your resolution too. For example I do most things exclusively inside the terminal. If you are using vim, and are making use of splits, on a 4k60hz(or 1440p144hz) screen and want to scroll on one split and not the other, you will notice how slow and laggy redraws are. This was especially noticeable on macos (yay work computers) for me, which led me down the GPU accelerated terminal rabbit hole. iterm2 had its metal renderer, which (at the time) only worked with ligatures disabled, whereas kitty/wez/etc did not have that limitation.

The litmus test I use is how smooth can the terminal emulator run `cmatrix` at fullscreen

trey-jones · on March 13, 2023

cmatrix is the benchmark I didn't know I needed. Konsole seems to handle it fine, even in multiple tmux panes at once, or maybe I just can't see.

imran-iq · on March 13, 2023

I've only had the issue on macos, konsole on my linux box works fine. I've stuck with kitty though cuz it works great on both linux and macos and I love the url opening feature as mentioned here: https://news.ycombinator.com/item?id=35140206

sllabres · on March 13, 2023

Probably inspired by the performance problems with the windows terminal [1] and the accelerated terminal [2] developed by Molly Rocket as 'answer'? A series of videos presenting the poc [3]

[1] https://news.ycombinator.com/item?id=28743687 (It takes a PhD to develop that) [2] https://github.com/cmuratori/refterm [3] https://www.youtube.com/watch?v=hxM8QmyZXtg, https://www.youtube.com/watch?v=pgoetgxecw8

buildartefact · on March 13, 2023

GPU-accelerated terminals have been a thing for a long time.

NoGravitas · on March 14, 2023

I've been doing a lot of my non-work computing lately on an actual VT420, which tops out processing bytes coming from the serial line (the computer you're logged in to) at 19.2kbps. I could stand for it to be faster, especially with the screen at 132x48. But never in 30+ years have I ever thought a terminal emulator connected to a session running over a pty on the same machine was slow.

I have started to see "terminal" apps that won't run on a real terminal, though. Using UTF-8 regardless of your locale, using 256-color xterm escapes regardless of your TERM setting, being unreadable without 256 colors, etc, and in general not using termcap/terminfo.

naikrovek · on March 13, 2023

because rendering on the CPU is CPU-intensive when there's a lot of stuff scrolling by.

even on an integrated GPU, text rendering is far faster when you use the GPU to render glyphs to a texture then display the texture instead of just displaying the glyphs individually with the CPU.

vidarh · on March 14, 2023

Only if the terminals rendering is extremely naive. That is, not using methods first used in the 80s

vidarh · on March 14, 2023

It's comical being downvoted for this without comment. Having actually analyzed terminal performance, and optimized terminal code, this is based on first hand experience. The vast performance difference between terminals is almost entirely unrelated to rendering the final glyphs.

msz-g-w · on March 14, 2023

I'd love to read your blog post about your experiences in this matter. We need more of these here on HN.

vidarh · on March 14, 2023

I'll add it to my (unfortunately far too long) backlog (he says and goes on to write an essay; oh well - for a blog post I'd feel compelled to be more thorough). But the quick and dirty summary:

1. The naive way is to render each change as it occurs. This is fine when the unbottlenecked output changes less than once a frame. This is the normal case for terminals and why people rarely care. It falls apart when you e.g. accidentally cat a huge file to the terminal.

Some numbers with the terminals I have on my system (ignored a whole bunch of xterm/rxvt aliases; e.g. aterm, lxterm etc.): cat of a file of 10MB on my system on a terminal filling half of a 1920x1024 screen on a Linux box running X takes (assume an error margin of at least 10% on these; I saw a lot of variability on repeat runs):

     * rxvt-unicode: 0.140s

     * kterm: 0.2s

     * kitty: 0.28s (GPU accelerated)

     * xterm: 0.51s

     * wezterm: 0.71s (GPU accelerated)

     * gnome-terminal: 0.86s

     * mlterm: 0.97s

     * pterm: 1.11s

     * st (suckless): 3.4s

Take this with a big grain of salt - they're a handful of runs on my laptop with other things running, but as a rough indicator of relative speed they're ok.

Sorted in ascending order. These basically fall in two groups in terms of the raw "push glyphs to the display" bit, namely using DrawText or CompositeGlyphs calls or similar, or using GPU libs directly.

Put another way: Everything can be as fast as rxvt(-unicode); everything else is inefficiencies or additional features. That's fine - throughput is very rarely the issue people make it out to be (rendering latency might matter, and I haven't tried measuring that)

Note that calling the rest other than kitty and wezterm not GPU-accelerated is not necessarily entirely true, which confuses the issue further. Some of these likely would be slower if run with an X backend with no acceleration support. I've not tried to verify what gets accelerated on mine. But this is more of a comparison between "written to depend on GL or similar" vs "written to use only the specific OS/display servers native primitives which may or may not use a GPU if available".

2. The first obvious fix is to decouple the reading of the app output from the rendering to screen. Rendering to screen more than once per frame achieves nothing since the content will be overwritten before it is displayed. As such you want one thread processing the app output, and one thread putting what actually changed within a frame to screen (EDIT: you don't have to multi-thread this; in fact it can be simpler to multiplex "manually" as it saves you locking whatever buffer you use as an intermediary; the important part is the temporal decoupling - reading from the application should happen as fast as possible while rendering faster than once per frame is pointless). That involves one big blit to scroll the buffer unless the old content has scrolled entirely out of view (with the "cat" example it typically will if the rest of the processing is fast), and one loop over a buffer of what should be visible right now on lines that have changed. The decoupling will achieve more for throughput than any optimisation of the actual rendering, because it means that when you try to maximise throughput most glyphs never make it onto screen. It's valid to not want this, but if you want every character to be visible for at least one frame, then that is a design choice that will inherently bottleneck the terminal far more than CPU rendering. Note that guaranteeing that is also not achieved just through the naive option of rendering as fast as possible, so most of the slow terminals do not achieve this reliably.

Note that this also tends to "fix" one of the big reasons why people take issue with terminal performance anyway: It's rarely that people expect to able to see it all, because there's no way they could read it. The issue tends to be when the terminal fails to pass on ctrl-c fast enough and stop output fast enough once the program terminates because of buffering. Decouple these loops and skip rendering that can't be seen and this tends to go away.

3. Second obvious fix is to ensure you cache glyphs. Server side if letting the display server render; on the GPU if you let the GPU render. Terminals are usually monospaced; at most you will need to deal with ligatures if you're being fancy. Some OS/display server provided primitives will always be server-side cached (e.g. DrawText/DrawText16 on X renders server-side fonts). Almost all terminals do this properly on X at least because it's the easiest alternative (DrawText/DrawText16) and when people "upgrade" to fancier rendering they rarely neglect ensuring the glyphs are cached.

4. Third fix is you want to batch operations. E.g. the faster X terminals all render whole strips of glyphs in one go. There are several ways of doing that, but on X11 the most "modern" (which may be GPU accelerated on the server side) is to use XRender and CreateGlyphSet etc. followed by one of the CompositeGlyphs, but there are other ways (e.g. DrawText/DrawText16) which can also be accelerated (CompositeGlyphs is more flexible for the client in that the client can pre-render the glyphs as it pleases instead of relying on the server side font support). Pretty much every OS will have abstractions to let you draw a sequence of glyphs that may or may not correspond to fonts.

There is a valid reason why using e.g. OpenGL directly might be preferable here, and that is that if used conservatively enough it's potentially more portable. That's a perfectly fine reason to use it, albeit at the cost of network transparency for those of us still using X.

So to be clear, I don't object by people using GPUs to render text. I only object to this rationale that it will result in so much faster terminals, because as you can see from the span of throughput numbers, while Kitty and Wezterm doesn't do too badly they're also nowhere near fastest. But that's fine - it doesn't matter, because almost nobody cares about the maximum throughput of a terminal emulator anyway.

msz-g-w · on March 14, 2023

Bookmarked! Will definitely come in handy when I finally write my own thing. Thank you for this!

vidarh · on March 14, 2023

You're welcome. It's a bit of a pet peeve of mine that people seem to be optimising the wrong things.

That said, to add another reason why doing the GPU ones may well be worthwhile on modern systems anyway, whether or not one addresses the other performance bits: Being able to use shaders to add effects is fun. E.g. I hacked in an (embarassingly bad) shader into Kitty at one point to add a crude glow effect around characters to make it usable with more translucent backgrounds. Doing that with a CPU based renderer on a modern resolution would definitely be too slow. I wish these terminals would focus more on exploring what new things doing GPU based rendering would allow.

flohofwoe · on March 14, 2023

In the 80's 'glyph rendering' was usually done right in hardware when generating the video signal though (e.g. the CPU work to render a character was reduced to writing a single byte to memory).

vidarh · on March 14, 2023

I was specifically thinking of bitmapped machines like the Amiga. Granted, e.g. a modern 4K display w/32bit colour requires roughly three orders of magnitude more memory moves to re-render the whole screen with text than an Amiga (typical NTSC display would be 640x200 in 2 bit colour for the Workbench), but ability of the CPU to shuffle memory has gone up by substantially more than that (raw memory bandwidth alone has - already most DDR2 would be able to beat the Amiga by a factor of 1000 in memory bandwidth), but the 68k also had no instruction or data cache, and so the amount of memory you could shuffle is substantially curtailed by the instruction fetching; for larger blocks you could make use of the blitter, but for text glyph rendering the setup costs would be higher than letting the CPU do the job)

flohofwoe · on March 14, 2023

> but for text glyph rendering the setup costs would be higher than letting the CPU do the job

Depends on how the glyph rendering is done. Modern GPU glyph/vector renderers like Pathfinder [1] or Slug [2] keep all the data on the GPU side (although I must admit that I haven't looked too deeply into their implementation details).

[1] https://github.com/pcwalton/pathfinder

[2] https://sluglibrary.com/

vidarh · on March 14, 2023

That part was about the Amiga blitter specifically. The setup cost for small blits and the relatively low speed of the blitter made it pointless for that specific use.

chupasaurus · on March 13, 2023

Then what about the existence of Konsole?

naikrovek · on March 14, 2023

it's existence is irrelevant. how does it perform? this conversation is about performance.

chupasaurus · on March 14, 2023

Better in latency and throughput without GPU-accelerated rendering while being feature-rich, that's the point.

naikrovek · on March 14, 2023

forgive me if i don't take your word for it. I remember K-everything being slow as molasses and have avoided K-anything since.

v3ss0n · on March 14, 2023

That's wrong since like KDE 5.10. Better test first before assumptions

naikrovek · on March 17, 2023

if it took 5 major revisions to address obvious and appalling performance problems then I'm quite sure my desire to stay away is justified.

Narishma · on March 18, 2023

It didn't take 5 major revisions. There was a big regression with KDE 4 compared to previous versions, and that was remedied in KDE 5.

bravetraveler · on March 14, 2023

Command line things that are noisy can legitimately run faster, being bound by the rate at which the terminal can dump characters

With high refresh rate displays it really helps avoid a blurry mess too

I can read the stream almost like The Matrix

otabdeveloper4 · on March 13, 2023

A faster terminal is a great idea, but GPU acceleration is not the way to do this.

GPUs aren't really meant for bit blitting sprite graphics. (Which is what a terminal really does.)

adastra22 · on March 13, 2023

Isn't that literally what GPUs were designed to do?

layer8 · on March 13, 2023

The term GPU is primarily associated with 3D graphics, and most of what GPUs do is designed for that. Hardware acceleration of 2D graphics existed long before 3D hardware acceleration became common for PCs, but wasn’t called GPU, instead it was simply referred to as a graphics card.

adastra22 · on March 13, 2023

Texture blitting is a very important part of 3D graphics, and is essentially what is required here.

layer8 · on March 13, 2023

The difference is that applying textures to a 3D object is almost never a pixel-perfect operation, in the sense of texture pixels mapping 1:1 to final screen pixels, whereas for text rendering that’s exactly what you want. Either those are different APIs, or you have to take extra care to ensure the 1:1 mapping is achieved.

adastra22 · on March 13, 2023

There are ways to configure the texture blitter to be precisely 1:1. This is written into the GL/Vulkan standards for exactly this reason, and all hardware supports/special cases it. It is how pretty much every GUI subsystem out there handles windowing.

layer8 · on March 14, 2023

Yes, my point is that this is a special case separate from normal 3D graphics.

incrudible · on March 14, 2023

…so why are GPUs not the way to do this, when GPUs are in fact the way it is commonly done?

jlokier · on March 13, 2023

The transforms are specified so you can position things perfectly these days, when aligned with screen pixels.

Think of the compositing of layers of translucent windows used in modern 2d window managers, while dragging them around. Or even scrolling in a browser. Those rely on the GPU for fast compositing.

Even for 3d, think of the screen-space techniques used in games, where it's necessary to draw a scene in layers combined with each other in various interesting logical ways (for shadows, lighting, surface texture, etc), with the pixels of each layer matching up in a reliable way.

flohofwoe · on March 14, 2023

Most of what a GPU does is drawing pixels though (even in 3D games), and that's as 2D as it gets.

rowanG077 · on March 13, 2023

If you can to the number crunching for 3D graphics, you see as hell can do it for 2D graphics.

layer8 · on March 13, 2023

It’s a different set of operations for the most part, when you look into it. Drawing a 2D line or blitting a 2D sprite is quite different from texture-shading a 3D polygon. It’s not generic “number crunching”.

rowanG077 · on March 13, 2023

It's tensor ops all the way down

imtringued · on March 13, 2023

Ok but just because operations aren't perfectly identical doesn't mean you can't do it and it certainly doesn't mean it will be slow. I have had great success with SDL_gpu.

incrudible · on March 13, 2023

Actually they were, before 3D acceleration became a thing, 2D acceleration was a thing.

layer8 · on March 13, 2023

The 2D acceleration wasn’t called GPU at the time though. That denomination only started with 3D acceleration.

incrudible · on March 14, 2023

The name may have changed, but the task of blitting remained with GPUs. They are good at it.

Shorel · on March 14, 2023

You are thinking about Trident cards, which had 2D acceleration.

Nowadays, everything is done with the same pipeline as the 3D graphics, there's no need for two pipelines.

GPUs are meant for this. Modern ones. Your knowledge is incomplete, outdated, or both.

flohofwoe · on March 14, 2023

In (realtime) rendering the saying goes "bandwidth is everything", and that's exactly what GPUs do really well, moving incredible amounts of data in a very short time.

pablo1 · on March 13, 2023

I agree with you, but stuck with wezterm for some time now for it's non-GPU related features. Specifically the font configuration with fallbacks and configurable font features such as ligatures and glyph variations is nice. I use a tiling window manager and a terminal multiplexer, so I have no use for terminal tabs/splits/panes. I wish there was something as "simple" as alacritty, but with nicer font rendering.

Shorel · on March 14, 2023

Kitty (the Linux one) has ligatures support, and it is GPU accelerated.

supernikio2 · on March 13, 2023

I love wezterm due to its ligature and colourscheme support, and the fact it's very clean and simple compared to, say, Konsole (I also generally use i3 leading to KDE apps not being the prettiest).

an-unknown · on March 13, 2023

> xterm was still better than most lightweight libvt-based terminals

Even worse: although many terminal emulators claim to emulate some "ANSI" terminal or be "VT100 compatible" and so on, most of them aren't at all. Simply run vttest in your terminal of choice and be surprised, especially by how many of them fail at very basic cursor movement tests. One of the few terminal emulators which gets most things right is xterm. It's also one of the very few terminal emulators which even supports exotic graphics capabilities like Sixel/ReGIS/Tek4014. Nobody should underestimate xterm …

tmtvl · on March 13, 2023

The author of Zutty has a pretty comprehensive writeup around that: https://tomscii.sig7.se/2020/12/A-totally-biased-comparison-...

rowanG077 · on March 13, 2023

Xterm is like that sleeper car. Looks basic but beats everyone if it comes down to it.

incrudible · on March 13, 2023

> I'm sure at full throughput the difference is there

I am not. It makes next to no sense to me. Maybe if you have a highres screen and dedicated VRAM. Otherwise going through the GPU interfacing ceremony just adds overhead.

vidarh · on March 14, 2023

Yeah, as I keep saying in these threads, the performance needed to do "fast enough" terminals was reached no later than the 1980s, and while bits per pixel and resolution has increased since then, it has increased slower than CPU speed. It's not the CPU cost of getting pixels on the screen that bottleneck most terminals.

zadjii · on March 13, 2023

In my experience, there's two archetypes of terminal users: * Open one window and leave it open forever. Reuse that one for all commands. * Open a window, run a couple commands, and close it.

For the second group, startup perf is everything, because users hit that multiple times a day. For the first group, not so much.

Some of the other tiling functionality is also more helpful for folks that aren't on platforms with as powerful of window managers (macOS, Windows)

badsectoracula · on March 13, 2023

I am in the second group, kinda - i hit Win+Shift+X (my global key for opening a new terminal) pretty much all the time to enter a few commands. I basically open terminals in a "train of thought"-like fashion, when i think of something that isn't about what i do in one terminal i open another to run/check/etc out. Sometimes i even close those terminals too :-P (when i work on something there might be several terminal windows scattered all over the place in different virtual desktops).

Also i'm using xterm and i always found it very fast, i never thought that i'd like a faster terminal.

minusf · on March 13, 2023

i think a very effective workflow is missing from this list: open a long running terminal window but have many tmux panes.

many modern wm's and terminals have multitab and multiwindow features but i invested time only into learning tmux and i can use it anywhere. and of course nohup functionality is builtin by definition.

i have said it before and i can say it again: terminals come and go, tmux stays.