Yep. I 403 turnitin and similar companies via nginx configuration, if ($http_referer ~* (TurnitinBot|PaperLiBot|idmarch|FairShare|Lightspeedsystems|ZmEu|BPImageWalker|semrushBot|ias_crawler|360spider|copyrightinfringementportal|PetalBot|Adsbot|SlySearch|NPBot)) { return 403; }
Legit Huawei IP ranges identifying as Huawei PetalBot were being abusive, definitely not obeying robots.txt, and searching for subsets of content that indicated they were looking to identify political dissidents with no worries about actually indexing the full site. I don't consider it a real search engine.
But yeah, maybe not a good fit for this list of educational and copyright parasites.
But my favorite robots.txt is,