> These scam sites load megabytes of junk, load slowly, have text interpersed wi...

gingerlime · on July 25, 2023

which should — in theory — get them penalized for cloaking. But obviously it doesn’t. Reinforcing GP’s point.

NavinF · on July 25, 2023

Google has gotten pretty lenient about that: https://developers.google.com/search/docs/essentials/spam-po...

"If you operate a paywall or a content-gating mechanism, we don't consider this to be cloaking if Google can see the full content of what's behind the paywall just like any person who has access to the gated material and if you follow our Flexible Sampling general guidance."

I wonder if they just gave up

post-it · on July 25, 2023

Hypothesis: Search, being Google's oldest product, is no longer prestigious to work in. It's in maintenance mode.

zerkten · on July 25, 2023

Does Google run other indexers for the purposes of catching cloaking? Are there other strategies that can be used? One of the problems of SO is that most of the valid content is out there and easily available without having to scrape the site which may mean penalizing for bad content is harder.

raverbashing · on July 25, 2023

And the fact that google is not detecting those is damning (to google)

XCSme · on July 25, 2023

Does it even make sense to serve different content to a bot than what a human would see? Isn't the search engine trying to rank content made for humans?

pjc50 · on July 25, 2023

It's an adversarial process. The search engine is, in theory, trying to rank by usefulness to the user, and the site owner is trying to maximize revenue by lying to the search engine. And the user.

AlexandrB · on July 25, 2023

I'm generally puzzled by Google's reluctance to do manual intervention in these cases. It's not like this is a secret. Just penalize the whole domain for 60 days every time a prominent site lies to the crawler.

sulam · on July 25, 2023

There are very many sites where the content you see as a non-logged-in user is different from what you see if you have in your possession an all-important user cookie.

gtirloni · on July 25, 2023

If Google's support is any indication, Google doesn't like to involve humans in their processes. There probably isn't enough humans to do this manual intervention you propose.

XCSme · on July 25, 2023

Then, maybe the "crawler" should be an actual PC navigating to the browser, taking a screenshot (or live feed) of the page and processing it with AI.

pjc50 · on July 25, 2023

Eh, Google choose to be identifiable as googlebot and to obey robots.txt for other reasons of "good citizenship", because not everybody wants to be crawled.

Scarblac · on July 25, 2023

It makes sense if you know your content isn't nice for humans (e.g. full of ads and tracking stuff) but you want it to rank high anyway.

soco · on July 25, 2023

I wonder what will I see if I change my browser's user agent?