Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Turning a dump of wikipedia (the kind you get on dumps.wikimedia.org) into a running instance of wikipedia that you can browse is actually insanely hard. I put considerable effort into it a few years back and gave up. Don't know what the state of this is right now. So unless the servers that survive are the ones running wikipedia in production now, archaeologists will be out of luck.

But those won't survive, the simple reason being that the people running them now are making continuous changes to them now and there is no guarantee that those changes preserve whatever information future archaeologists will find of value (which we can't really know now).

And this is true for much simpler mediums than software as well. Take film for example. For film printed on celluloid, all you need is for a good copy of each reel of each film to survive and you've got pretty much a guarantee that this piece of culture will be preserved somehow. Nowadays films are digital thingamajigs rather than celluloid artefacts and the institutions tasked with looking after that digital cultural heritage go about it with an editorializing rather than archival mindset. (e.g. taxpayer-funded BBC removed an old episode of Fawlty Towers, itself a taxpayer-funded BBC production, from their streaming platform for being racist after the George Floyd incident [1]).

Besides: With virtualization and the various forms of infrastructure abstraction in combination with encryption-based security models, even a hypothetical scenario where all human life ceases to exist but all servers somehow survive on the bare metal layer would probably fail to preserve our digitual cultural artefacts.

Hard disks owned by private individuals with a "digital hoarder" mindset would probably make for a more useful archaeological find than servers.

[1] https://en.wikipedia.org/wiki/The_Germans



> (e.g. taxpayer-funded BBC removed an old episode of Fawlty Towers, itself a taxpayer-funded BBC production, from their streaming platform for being racist after the George Floyd incident [1])

Just read up on that and am shaking my head in disbelief about the fragility of, well, everything really. Technology, emotions, interpretations, a general sense of having to pre-emptively react to anything and everything. Even at the time of creation, Basil Fawlty was a caricature of a deeply despicable man and other characters equally so. Best leave it to John Cleese himself to sum it up:

> Cleese spoke against the removal of the episode due to the Major's use of racial slurs: "The Major was an old fossil left over from decades before. We were not supporting his views, we were making fun of them. If they can't see that, if people are too stupid to see that, what can one say?"


This very much reminds one of the contemporary crusades against The Adventures of Huckleberry Finn. One of the characters, a run-away slave by the name of [pejorative] Jim, is actively railed against by much of society for being an imbecile, uneducated, and so on.

Yet throughout the story Huck runs into all sorts of people who are mostly acting like great people on the outside, yet invariably turn out to be horrible people on the inside (even including Huck himself). The one exception is Jim who actually ends up being a selfless and good person, inside out, from the start to the end.

The whole story is a reminder that what people pretend to be, and what they are - often have a rather strong disconnect. That many schools have successfully banned the book from the classroom because of the pejorative used, is perhaps one of the clearest reflections of the state of contemporary education. It'd be like if Germany had chosen to ban Schindler's List because the lead character is a Nazi.


Indeed. If you censor the past, you’re doomed to repeat it. I absolutely support mandating giving proper context, to aid understanding. That’s what school curriculums could be about. If you change the teaching of past events (or worse, the source material itself) according to contemporary tastes, consequently all of the past becomes largely meaningless and a tool to be wielded to further populist agendas.


> "Turning a dump of wikipedia (the kind you get on dumps.wikimedia.org) into a running instance of wikipedia that you can browse is actually insanely hard."

Given the state of the world I recently looked into this: https://www.kiwix.org/en/

It takes 87 GB and a single click to create a local Wikipedia with pictures included. That software also has support for downloading data from a vast array of other sources as well.


It lacks high-res images, category pages and I think "List of..." pages though, I think. Especially categories are a bummer.


You want to leave future archeologist something to do: "we could link these low-res jpegs to some high res webP even some SVG. It was a delicate task since we recovered it from an old Seagate Baracuda 2TB drive. We even had to break some ancient pre quantum cryptography! We believe that we now have the whole collection of stylized ape pictures. Traded for their high ritual value among the Cult of Eneftee"




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: