I think it's a fine example since the decomposed version clearly exists and can arrive as data into your program in a number of ways.
That said, the grapheme cluster note will have examples of extended notions of characters that can't be represented by an equivalent single codepoint. There are some korean and indic examples and also emoji http://unicode.org/reports/tr29/
Oh and something that stirs the heart of us hackers: \r\n is a single grapheme!
IMO, len("") should raise OperationHasNoCommonSenseTodayException, just to not confuse unicode newbies. Modern text is not an array, it is a format, a complex format. Zero-width, double-width monotypes, combining, direction marks, normalization, etc. Almost no one wants to know about codepoint details when working with "just input strings", and those who want may use special namespaces for that. There is no point in making len() an alias for unicode.volatile_number_of_distinct_code_points().
visually_empty(s) is okay, len(s) is probably not.
They're all valid measurements. The length of a string could be measured in bytes (which doesn't require you to know the encoding), code points (which doesn't require huge Unicode grapheme clustering tables), grapheme clusters (which doesn't require a font) or even pixels (which does require a font).
The best thing a language (or library) can do is to not bless any single one of these as the default. Strings shouldn't have a length at all. They should provide properties/methods/accessors like byte_count, codepoint_count, grapheme_count etc. Make the user of the API think every time they're asking for a length of the string - which one do they actually need? Which one is the best for whatever they're trying to do?
None of the mainstream ones that I can think of. Which is really unfortunate, not the least because the defaults are all over the place - usually it's either bytes (when strings are UTF-8) or code units (when strings are UTF-16 - note, code units, not code points, so surrogate pairs count as 2!). Occasionally it's genuine code points, as in Python. Which, I think, goes to show why it's such a mess.
I think that if you treat strings as just lists[0] of UTF-8 code units, and code points, grapheme clusters, etc. are just views/adapters of those bytes, you're probably going to benefit the most.
[0]: When I say 'lists', I mean whatever the standard idea of a sequence of things is in the language. For C that's the array, or maybe the pointer+length pair. For Go it's a slice. For Rust, an iterator perhaps? For Python, it's a list.
That said, the grapheme cluster note will have examples of extended notions of characters that can't be represented by an equivalent single codepoint. There are some korean and indic examples and also emoji http://unicode.org/reports/tr29/
Oh and something that stirs the heart of us hackers: \r\n is a single grapheme!