> In some cases, you might also want to avoid characters that sound similar when...

gajus · on April 23, 2024

Somewhat related, I always liked the concept of https://what3words.com/

TheDong · on April 23, 2024

what3words has a proprietary implementation and has sent fairly silly legal threats: https://news.ycombinator.com/item?id=27020810

I'll happily boycott that for-profit company which is masquerading as a public utility, but charging money and going after anyone who reverse engineers what words are what locations.

See also the comments in https://news.ycombinator.com/item?id=27058271

This is exactly the sort of thing that shouldn't be a private company, just like Lat/Lon coordinates and street addresses are effectively public domain, any suitable replacement for lat/lon should also be public domain.

gajus · on April 23, 2024

Yikes. Well, less of fan now!

simonw · on April 23, 2024

They have some pretty bad flaws in their design relating to this topic:

https://twitter.com/jonty/status/1570062564523917312

> the actual address should be "keen.lifted.fired" instead of "keen.listed.fired" and someone clearly misheard over the phone

Terr_ · on April 23, 2024

Yeah, ideally the dictionary first would undergo rather rigorous pruning based on things like phonetic similarity or how easily a typo might move between two valid words.

That scoring/clustering process makes for interesting problems in their own right, especially if one throws accents into the mix.

10000truths · on April 23, 2024

The problem with words is that their encoding density is much lower, so it requires more space to store. Suppose you create an alphabet A that consists of the N most common English words. Then, what might be Q characters in base 58 would instead require Q*ln(58)/ln(N)*((avg word length in A)+1)-1 characters. For N=1000 and assuming that the average word length is 5, this gives a factor of ~3.5x increase in storage space required (e.g. a 20 character base-58 ID would map to a ~70 character string of words).

tornadofart · on April 23, 2024

That is true. But is it really a storage problem? Could you not store in whatever base-N arithmetic that has high encoding density, and "just" use the words for display/printing and such? Probably it is more a problem of restricting the range of representable numbers because users are unable to handle pages over pages of random words...

Dylan16807 · on April 23, 2024

Who cares about that much space?

If you do, you're not storing your bits as text to begin with.

shkkmo · on April 23, 2024

You then have to currate a list of words which also don't have similar sounds, are comprised of subwords, aren't offensive, or other gotchas.

I don't think words work well for codes that aren't meant to memorized. They make it harder to currate a unambiguous list since that list needs to be several orders of magnitude larger and the ambiguity can accent dependent. Of course, if memorization may be needed, then that is effort may be worthwhile.

Error detection with codes isn't hard, that's why checksums exist.

ahazred8ta · on April 23, 2024

There are several wordlists which have been curated this way. -- https://en.wikipedia.org/wiki/PGP_word_list

shkkmo · on April 24, 2024

Thanks, that's a neat resource to making hexadecimal numbers for memorizable and easier to transmit phonetically with some built in error checking from the odd/even list alternation.

However, for the core purpose of the phonetic transmission, it seems needlessly verbose and cumbersome. The short wordlist combines with some fairly long component words to make the phonetic representation unnecessarily long. Additionally, I'm not super into some of the fairly obscure names and words included on that list. If I don't need memorability and hexadecimal atomicity, it doesn't seem worth using.

Dalewyn · on April 23, 2024

>we can also use words.

And we do, Bravo for B, Papa for P: https://en.wikipedia.org/wiki/NATO_phonetic_alphabet

Always use phonetic code if you're transcribing letters to someone, especially over phone/radio. It saves a lot of hassle on both sides.

If you don't remember the code, no big deal: For everyday situations, use any easily understood word. Like Apple for A.