Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The point is entertaining, but does not undermine Kolmogorov complexity.

First, for any single fixed language, the problem does not exist. That includes any single L_silly language.

Second, when considering the set of all languages, there is indeed an issue. However it can be remedied by requiring an implementation of the language - while we might say that program P_117 has, in language L, the semantics that it emits

2983483746923874560238475613849523984560238475620837456198736491378461938764134

(a fixed very long string), supposedly showing that P_117 which is of length 5 can emit such a long string, it doesn't really. To implement that language, you would need to encode that long string somehow in the implementation.

In other words, if you consider a single language, that's fine. If you consider all possible languages, you must require an implementation of them, not just an abstract definition. The implementation must be shared between all of them, which is equivalent to a single language in the first place. And a single language is really where the benefit and intuition of Kolmogorov complexity lies anyhow.

The confusion arises when we try to prove that the language "doesn't matter" as it's just a constant factor. There is something faulty with that proof, or rather it proves something other than what it is assumed to, but Kolmogorov complexity itself is still valid.



> There is something faulty with that proof, or rather it proves something other than what it is assumed to, but Kolmogorov complexity itself is still valid.

Well, Kolmogorov complexity depends on language choice, there's no way around that and the invariance theorem doesn't say anything different. It's more subtle than that, so we have to be precise.

The invariance theorem shows that given languages L and M, there's a constant c such that for any string x, |K_L(x) - K_M(x)| <= c. Intuitively, that constant c is the program size of an M interpreter in L (or vice versa). It's easy to see how K_L(x) can't be larger than <an M interpreter in L> plus <the shortest M program that outputs x>.

What this means is that given sufficiently large and complex strings x, K_L(x) is very close to K_M(x), because c is constant and finite, while you can construct a string x with arbitrarily large K_L(x). This gives us confidence that K_L measures something "real", even though our measurement is relative to the (arbitrary) definition of L.

Of course, for short programs, the constant factors dominate. You won't get a shorter Ruby program for your 30 character string by implementing a full Perl interpreter. In this case, the constant difference c is so large relative to |x| that the invariance theorem, while technically correct, is basically pointless.


Yes, exactly. I think the difficulty is that the invariance theorem talks about any two fixed languages - then we can bound the difference between them - but the article here talks about all possible languages. If you consider them all at once, things do seem peculiar, but it's meaningless: Considering any one is sufficient, and the invariance theorem justifies that by showing that no other one would be better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: