Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I like the approach here! Saving to a simple zip file is elegant. I worked on a similar idea years ago [0], but made the mistake of building it as a frontend. In retrospect, I would make this crawl using a headless browser and serve it via a web application, like you're doing.

I would love to see better support for SPAs, where we can't just start from a sitemap. If you're interested in, you can check out some of the code from my old app for inspiration on how to crawl pages (it's Electron, so it will share a lot of interfaces with Puppeteer) [1].

[0] https://github.com/CGamesPlay/chronicler/tree/master [1] https://github.com/CGamesPlay/chronicler/blob/master/src/mai...



It tries to fetch a sitemap for in case there's some missing link. But it starts from the root and crawls internal links. There's a new mode added this morning for spa with the option `--spa` that will write the original HTML instead of the generated/rendered one. That way some apps _will_ work better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: