> I think generative AI should be able to provide links to similar source materi...

WhiteNoiz3 · on Dec 30, 2023

> Except these aren't databases, so that's generally not possible

Not directly and not in every case, but it IS possible to use embeddings to link to similar material. People are doing it pretty commonly using the RAG approach and Bard is already providing sources, etc. It may not be perfect, but the onus is on the AI companies to figure out how to do it right not just claim helplessness.

> Okay... but then, if I write a book should I be able to opt out of you being allowed to read it? What conditions should I be able to put on who can read my work?

Sites that don't want to appear in search results or have sensitive info they don't want to get into search engines can use the Robots.txt which is as old as the internet. There are many valid reasons to have mechanisms to prevent something from being included in training data, and I would also argue this is a core feature that is necessary to spur adoption by businesses as we've already seen. Otherwise, I am not sure I understand your reasoning.. people can publish websites and opt to have them excluded from search, the same should apply to AI.

brookst · on Dec 30, 2023

Well said. Extending copyright to control content consumption and learning is a recipe for converting all of our mass media into businesses as abusive and usurious as textbook companies.

This is a power grab by publishers.