> look at the network tab The challenge there is automating it, though - usually...

Raed667 · on Aug 8, 2023

You can use the application context, while also automatically intercepting requests. Best of both worlds.

puppeteer: https://pptr.dev/api/puppeteer.page.setrequestinterception

playwright: https://playwright.dev/docs/network#network-events

halJordan · on Aug 11, 2023

And the article is about using puppeteer ...

ricardo81 · on Aug 8, 2023

In browsers 'copy as curl' is decent enough. Do the request through a command line.

If there's ephemeral cookies, they tend to follow a predictable pattern.

z3t4 · on Aug 8, 2023

Static files are much easier to scrape. Its even easier to scrape a static page then it is to use api's

1vuio0pswjnm7 · on Aug 9, 2023

Care to provide some examples. The majority of sites submitted to HN do not even require cookies let alone tokens in special headers. A site like Twitter is an exception not the general rule.

geysersam · on Aug 9, 2023

As a scraping target, Twitter is closer to the rule than the exception.

1vuio0pswjnm7 · on Aug 10, 2023

Not sure about "scraping targets". I'm referring to websites that can be read without using Javascript. Few websites submitted to HN try to discourage users with JS disabled from reading them by using tokens in special headers. Twitter is an exception. Twitter's efforts to annoy users into enabling Javascript are ineffective anyway.

https://github-wiki-see.page/m/zedeus/nitter/wiki/Instances

paulddraper · on Aug 8, 2023

That's true....but that was already true.

Whatever method you were using before SPAs to authenticate your scraper (HTTP requests, browser automation), you can use that same method now.