Personally, I found the Unicode spec very messy. Critical information is all over the place. You can see the direct effect of this when you compare various packages across different languages and discover that every library disagrees in multiple places. Even JS String.normalize() isn't consistent in the latest version of most browsers: https://adraffy.github.io/ens-normalize.js/test/report-nf.ht... (fails in Chrome, Safari)
The major difference between ENS and DNS is emoji are front and center. ENS resolves by computing a hash of a name in a canonicalized form. Since resolution must happen decentralized, simply punting to punycode and relying custom logic for Unicode-handling isn't possible. On-chain records are 1:1, so there's no fuzzy matching either. Additionally, ENS is actively registering names, so any improvement to the system must preserve as many names as possible.
At the moment, I'm attempting to improve upon the confusables in the Common/Greek/Latin/Cyrillic scripts, and will combine these new grouping with the mixed-script limitations similar to IDN handling in Chromium.
I'm trying to find the best combination of UTS-46, UTS-51, UTS-39, and prior work on IDN resolution w/r/t confusables: https://adraffy.github.io/ens-normalize.js/test/report-confu...
Personally, I found the Unicode spec very messy. Critical information is all over the place. You can see the direct effect of this when you compare various packages across different languages and discover that every library disagrees in multiple places. Even JS String.normalize() isn't consistent in the latest version of most browsers: https://adraffy.github.io/ens-normalize.js/test/report-nf.ht... (fails in Chrome, Safari)
The major difference between ENS and DNS is emoji are front and center. ENS resolves by computing a hash of a name in a canonicalized form. Since resolution must happen decentralized, simply punting to punycode and relying custom logic for Unicode-handling isn't possible. On-chain records are 1:1, so there's no fuzzy matching either. Additionally, ENS is actively registering names, so any improvement to the system must preserve as many names as possible.
At the moment, I'm attempting to improve upon the confusables in the Common/Greek/Latin/Cyrillic scripts, and will combine these new grouping with the mixed-script limitations similar to IDN handling in Chromium.
Interactive Demo: https://adraffy.github.io/ens-normalize.js/test/resolver.htm...
Also this emoji report is pretty cool: https://adraffy.github.io/ens-normalize.js/test/report-emoji...