> I think generative AI should be able to provide links to similar source material in the training data
Except these aren't databases, so that's generally not possible, in the same way that it's not possible for your provide links to the source material it took to write your reply. How much learning led to the weights on your neurons that allowed you to generate that? Where did you learn about using italics and it's effect on how the words would be interpreted? Where did you learn the tone that would be appropriate in this particular forum?
> People should be able to opt out of having their content used for training
Okay... but then, if I write a book should I be able to opt out of you being allowed to read it? What conditions should I be able to put on who can read my work? Religion? Skin colour? People that aren't good at memorizing?
Hopefully the idea of putting limits on who can acquire knowledge sounds absurd to you. Why are those same limits okay if they're on 'what' rather than 'who'?
> AI companies are just trying to avoid lawsuits by keeping it secret
Which has created a barrier to further research. Instead of me and Joe being able to collaborate on research and papers using the same datasets, we now hide our training data lest the luddites come to smash the machines because learning is only okay if not done too well.
> Except these aren't databases, so that's generally not possible
Not directly and not in every case, but it IS possible to use embeddings to link to similar material. People are doing it pretty commonly using the RAG approach and Bard is already providing sources, etc. It may not be perfect, but the onus is on the AI companies to figure out how to do it right not just claim helplessness.
> Okay... but then, if I write a book should I be able to opt out of you being allowed to read it? What conditions should I be able to put on who can read my work?
Sites that don't want to appear in search results or have sensitive info they don't want to get into search engines can use the Robots.txt which is as old as the internet. There are many valid reasons to have mechanisms to prevent something from being included in training data, and I would also argue this is a core feature that is necessary to spur adoption by businesses as we've already seen. Otherwise, I am not sure I understand your reasoning.. people can publish websites and opt to have them excluded from search, the same should apply to AI.
Well said. Extending copyright to control content consumption and learning is a recipe for converting all of our mass media into businesses as abusive and usurious as textbook companies.
Except these aren't databases, so that's generally not possible, in the same way that it's not possible for your provide links to the source material it took to write your reply. How much learning led to the weights on your neurons that allowed you to generate that? Where did you learn about using italics and it's effect on how the words would be interpreted? Where did you learn the tone that would be appropriate in this particular forum?
> People should be able to opt out of having their content used for training
Okay... but then, if I write a book should I be able to opt out of you being allowed to read it? What conditions should I be able to put on who can read my work? Religion? Skin colour? People that aren't good at memorizing?
Hopefully the idea of putting limits on who can acquire knowledge sounds absurd to you. Why are those same limits okay if they're on 'what' rather than 'who'?
> AI companies are just trying to avoid lawsuits by keeping it secret
Which has created a barrier to further research. Instead of me and Joe being able to collaborate on research and papers using the same datasets, we now hide our training data lest the luddites come to smash the machines because learning is only okay if not done too well.