Why not rely on an robots.txt entry instead of having to explicitly include in h...

lexlash · on April 25, 2023

Reading the related GitHub issues the dev seems to just not understand HTTP or web crawling etiquette before you get into the “actually AI is good for creators” pitches. The damage is probably done - even if this gets fixed, unethical people building datasets will just use the old versions.

edent · on April 25, 2023

Because - according to the developer - respecting robots.txt is unethical.

His contention is that denying content to AI tools deprives people of their right to better AI tools...

kordlessagain · on April 25, 2023

It’s a straw man argument, which gives you a good idea inside the psyche of the dev.

If anything picks up a URL and uses it later, that is definitely a web crawler.

onepointsixC · on April 25, 2023

Seems pretty clear that it's meant to be malicious compliance with consent, with consent being automatically assumed unless you say no to this specific scrapper, as though there were even a reasonable chance millions of sites could possibly know about the exact tag.

beaviskhan · on April 25, 2023

Probably because he knows doing so would make his life harder and give him less data to scrape.

sharemywin · on April 25, 2023

I'd also be curious what headers he sends like USER-AGENT

sharemywin · on April 25, 2023

that's what I was thinking too.