Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wrote a very simple soup content downloader some time ago, you can get it here: https://github.com/urxvtcd/soup-io-downloader

It has some shortcomings, mainly content is saved under random file name without extension. Hm, maybe I'll try to fix that now.



I'm working on my own (don't trust noone), that parses everything into a neater JSON file that I can use later. I download the files with just bash magic.

https://github.com/ikari-pl/downsouper

Still has some downsides, but getting better. It doesn't download the posts from discussions yet. You can get your own posts and links to full-size images (no resize means both better quality AND faster download), includes the jq-xargs-wget pipes I used.

Then I want to add an exporter that would be able to convert it to, whatever, wordpress export format? Cry and go to tumblr? No idea. :(


I'm afraid that they will be gone sooner if folks start to download their contents in bulks :(


What else can we do. What else.

Surprisingly enough, the assets are on an awesome CDN. Getting the HTMLs took me ~8 hours. Getting 19 GB of images - maybe 10 minutes?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: