UCS-32 is at least directly indexable, even though it's ludicrously space-ineffi...

natanbc · on Jan 6, 2023

Only in codepoints, but it still has the problem GP mentions of ` + e = è being two codepoints (so two elements in UCS-32), but being logically one character

https://manishearth.github.io/blog/2017/01/14/stop-ascribing...

qalmakka · on Jan 8, 2023

This, it's pointless to have char32_t if you still need to pull several megabytes of ICU to normalize the string first in order to remove characters spanning over multiple codepoints. UTF32 is arguably dangerous because of this, it's yet another attempt to replicate ASCII but with Unicode. The only sane encoding out there is UTF-8, and that's it. If you have to always assume your string is not really splittable without a library, you won't do dangerous stuff such as assuming `wcslen(L"menù") == 4`.