Couldn't you use the same argument to reach the absurd conclusion that the 7zip source code contains the vast majority of Harry Potter?
A decent control would be to compare it to similar prose that you know for a fact is not in the training data (e.g. because it was written afterwards).
I think the same argument would have to compare 7zip's compression to some other compression algorithm. Then we can say things like "7zip is a better/worse model of human writing". And that's probably a better way to talk about this as well.
You're right that a better baseline could be made using books not in the training set, to understand how much is the model learning prose and how much is learning a specific book.
A decent control would be to compare it to similar prose that you know for a fact is not in the training data (e.g. because it was written afterwards).