Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think that's a fair analogy. One forces 99% of websites to make a change, while the other is something that would need to be done by the big companies doing the scraping.

A Do Not Track flag being legally binding would force small websites, e.g. a local restaurant website, to implement something they likely are not aware of and secondly do not technically understand.

A company that is mass scraping data for their AI model is much more likely to understand and respect that scraping the data has legal implications, and would be technically capable in implementing a scraping solutions that accounts for a robots.txt.



I'm gonna guess it often isn't even their content but is user content they are protecting. So, sounds like a big subsidy/protection racket for Twitter or whatever to train on their users' public content but not let others.


If I understand parent correctly, the restriction flag is opt-in? This turns copyright around completely, expecting every small content producer to implement something they likely are not aware of and secondly do not technically understand.


At very least robots.txt is from 1994; it has been part of the web almost from the start (web became public in 1991, so within 3 years).

Claiming ignorance here would be just a little bit disingenuous.


The X-Robots-Tags header already exists as "noai" and "noimageai". Scraping software like img2dataset respects these by default.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: