Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why not rely on an robots.txt entry instead of having to explicitly include in http header to not have ai ?


Reading the related GitHub issues the dev seems to just not understand HTTP or web crawling etiquette before you get into the “actually AI is good for creators” pitches. The damage is probably done - even if this gets fixed, unethical people building datasets will just use the old versions.


Because - according to the developer - respecting robots.txt is unethical.

His contention is that denying content to AI tools deprives people of their right to better AI tools...


It’s a straw man argument, which gives you a good idea inside the psyche of the dev.

If anything picks up a URL and uses it later, that is definitely a web crawler.


Seems pretty clear that it's meant to be malicious compliance with consent, with consent being automatically assumed unless you say no to this specific scrapper, as though there were even a reasonable chance millions of sites could possibly know about the exact tag.


Probably because he knows doing so would make his life harder and give him less data to scrape.


I'd also be curious what headers he sends like USER-AGENT


that's what I was thinking too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: