Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Or there's no Java language, it's all just UTF-8 characters.

Some programming languages explicitly require UTF-8 in the language specification. Java is not one of them.



As of Java 18, it is: https://openjdk.org/jeps/400


That's not what your link says:

    The Java language allows source code to express Unicode
    characters in a UTF-16 encoding, and this is unaffected
    by the choice of UTF-8 for the default charset.
Perhaps you pasted the wrong one by mistake?

Not that such a change to the language for future code, even if only hypothetical, would bear any difference to this discussion anyway as legacy Java code encoded in other charsets is still Java code.


Just one sentence after the one you quoted:

> However, the javac compiler is affected because it assumes that .java source files are encoded with the default charset, unless configured otherwise by the -encoding option.

Interestingly, in Windows, Java programs were supposedly encoded in CP-1252 before this...?

> In JDK 17 and earlier, the default charset is determined when the Java runtime starts. On macOS, it is UTF-8 except in the POSIX C locale. On other operating systems, it depends upon the user's locale and the default encoding, e.g., on Windows, it is a codepage-based charset such as windows-1252 or windows-31j.


1. javac is an implementation, not a specification.

2. As noted in the very quote provided, said particular implementation accepts various encodings; naturally, as the language allows various encodings.

That is quite unlike the languages that specify that anything other than UTF-8 is invalid code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: