More

jruohonen · 2026-02-18T21:48:13 1771451293

I wish it would stay there.

jruohonen · 2026-02-18T21:34:24 1771450464

"I don't know how long we can keep it up," he said.

Science is already beyond the point, I reckon.

jruohonen · 2026-02-14T20:04:00 1771099440

Apologies if it sounds like promotional material, but the topic is something that has been discussed also here a lot. I also predicted a while back that vetting, curation, trust chains, and alike will be coming.

jruohonen · 2026-02-14T19:19:55 1771096795

It affects science too (and there you'd want solid archiving as much as possible). Increasingly, meta-data is full of errors and general purpose search engines for science are breaking down, including even things like Google Scholar. I suppose some big science publishers are blocking AI bots too.

shevy-java · 2026-02-14T19:30:50 1771097450

Google ruined its own search engine on top of that as well though.

We are increasingly becoming blind. To me it looks as if this is done on purpose actually.

terminalshort · 2026-02-15T02:23:58 1771122238

Did Google ruin it, or did advesarial activity between Google's algorithm and SEO ruin it? The latter seems more likely because the incentives make sense, and also inevitable.

visarga · 2026-02-15T07:09:21 1771139361

Google ruined it, maximizing ad sales no matter the outcomes. SEO adapted to Google, Google adapted only to maximize their own profits.

joquarky · 2026-02-15T23:08:34 1771196914

In practice, Doubleclick acquired Google, so they now cover both sides of the adversity.

salawat · 2026-02-14T19:37:05 1771097825

It was. Advertising is incompatible with accurate data retrieval/routing. We've also implemented "obligation to deindex". So providing an unbiased index of the web as she is is essentially (in the U.S.) verboten.

ninjagoo · 2026-02-14T19:27:16 1771097236

> I suppose some big science publishers are blocking AI bots too.

That's a travesty, considering that a huge chunk of science is public-funded; the public is being denied the benefits of what they're paying for, essentially.

galleywest200 · 2026-02-14T19:28:39 1771097319

The public can still access the sites themselves.

ninjagoo · 2026-02-14T19:31:35 1771097495

> The public can still access the sites themselves.

Indefinitely? Probably not.

What about when a regime wants to make the science disappear?

thwarted · 2026-02-14T19:46:47 1771098407

So the solution is to allow the AI scraping and hide the content, with significantly reduced fidelity and accuracy and not in the original representation, in some language model?

mlnj · 2026-02-14T21:29:03 1771104543

Don't forget the onslaught of ads that will distort the actual publications even more going forward.

pa7ch · 2026-02-14T19:43:24 1771098204

What has that got to do with blocking AI crawlers?

ninjagoo · 2026-02-14T19:55:01 1771098901

If it's publicly funded, why shouldn't AI crawlers have access to that data? Presumably those creating the AI crawlers paid taxes that paid for the science.

JumpCrisscross · 2026-02-14T20:45:07 1771101907

> If it's publicly funded, why shouldn't AI crawlers have access to that data?

Becase it costs money to serve them the content.

8bitsrule · 2026-02-15T02:12:03 1771121523

Crawlers accessing public data could be required to provide searchable access to the public data they collect. Value-for-value.

wyre · 2026-02-14T21:38:21 1771105101

If I build a business based off of consumption of publicly funded data, and that’s okay, why isn’t it okay for AI?

Is the answer regulate AI? Yes.

JumpCrisscross · 2026-02-14T22:35:23 1771108523

> If I build a business based off of consumption of publicly funded data, and that’s okay, why isn’t it okay for AI?

Because when you build it you aren't, presumably, polling their servers every fifteen minutes for the entire corpus. AI scrapers are currently incredibly impolite.

heavyset_go · 2026-02-15T12:40:18 1771159218

Plenty of public funded data isn't made free and public access. Sometimes you need to pay, or get a license, etc depending on what you're doing with it.

upboundspiral · 2026-02-15T17:46:05 1771177565

If anyone wants the surreal experience of seeing blogs and websites made by real humans they should check out https://marginalia-search.com

It's far from perfect but it does achieve its stated goal: of resurfacing real people on the internet.

It recently got some NLNet funding and I hope to see it flourish - to my knowledge there aren't any other projects trying to claw back control of the internet towards the commons.

https://about.marginalia-search.com

asdff · 2026-02-14T21:59:59 1771106399

Thank god for pubmed and deterministic search operators.

jruohonen · 2026-02-11T06:58:00 1770793080

Now EC. I wonder how many countries were breached in one sweep?

jruohonen · 2026-02-05T20:26:24 1770323184

I cannot say how valid it is but it is interesting because everyone else is saying the contrary (i.e., according to his data, the volume has gone quite rapidly down post-2021).

jruohonen · 2026-02-04T19:52:50 1770234770

At his best. (And I don't mean business or politics per se but as a philosophical take to life.)

jruohonen · 2026-02-04T20:47:09 1770238029

Oh no, people are mean to me :-O. I mean, sure, we can talk about sociopaths and whatnot, who may or may not have their means, but do we need such role models? Ends, not means. Kant?

jruohonen · 2026-02-04T19:45:35 1770234335

Duplicate:

https://news.ycombinator.com/item?id=46884471

jruohonen · 2026-02-04T11:21:56 1770204116

So it is already happening, as predicted:

https://news.ycombinator.com/item?id=46678710

jruohonen · 2026-02-02T19:52:25 1770061945

OA, FTW and WTF.