I thought 200x is too extreme as well. In compression literature, is there a way...

Sesse__ · on May 25, 2024

There is not, because there could always be some underlying tricky generator that you just haven't discovered, and discovering that pattern is basically equivalent to solving the halting problem. (See https://en.wikipedia.org/wiki/Kolmogorov_complexity#Uncomput...)

As a trivial example, if your dataset is one trillion binary digits of pi, it is essentially incompressible by any regular compressor, but you can fit a generator well under 1 kB.

iamcreasy · on May 25, 2024

Cool. Thanks. How about lossy compression?

Sesse__ · on May 25, 2024

The same, since lossy compression can never be worse than lossless compression. (Also, it is more complex since you have to define your loss somehow. These Neuralink samples seemingly come as .wav files, but you probably wouldn't want to encode them with MP3!)