> From direct personal experience, I strongly believe the backend infrastructure...

evanelias · on May 17, 2024

No, nothing about this is related to analytics. I was strictly describing storage, caching, and compute for core product functionality in my previous comment, which is written from direct first-hand experience working on infrastructure for social networks for a decade, including the two social networks I referenced in my previous comment.

Social networks store a lot of OLTP data just to function. Every user, post, comment, follow/friendship relation, like/favorite/interaction, media metadata -- that all gets stored in sharded relational databases and retrieved in order for the product to operate at all. For successful social networks, it adds up to trillions of rows of data (on the smaller end, for something like Tumblr) and that requires a lot of expensive infrastructure to operate. Again, none of this has any relation to analytics.

As for usenet, what? It's basically dead, after becoming an unmanageable cesspool of spam (or worse) more than two decades ago. It was great in the 90s, but the internet population was substantially smaller then.

beeboobaa3 · on May 17, 2024

> As for usenet, what? It's basically dead, after becoming an unmanageable cesspool of spam (or worse) more than two decades ago. It was great in the 90s, but the internet population was substantially smaller then.

Yeah, because no one is interested in promoting it because it doesn't have analytics baked in so you can't make money from doing so. Of course it deteriorated over the years. It's also cheap to run and can handle a massive amount of users.

> store a lot of OLTP data just to function

Right, so they can run analytics. You could reduce your tracking data to aggregates, but then you can't go back and run analytics on your users. You don't need to keep that data forever.

Especially with modern social media where content older than a day is effectively dead and ignored.

> it adds up to trillions of rows of data

This was a lot of data a decade ago. Nowadays a single postgres instance will handle billions of rows without breaking a sweat, and social media content is exceptionally shardable.

evanelias · on May 17, 2024

> Right, so they can run analytics.

Stop gaslighting me, it's not OK! I'm describing first-hand experience of things that were not related to analytics IN ANY WAY, SHAPE, OR FORM.

Try running OLAP queries on a massively sharded MySQL 5.1 deployment, or any aggregation at all on a Memcached cluster. These technologies were designed for OLTP data, and were woefully incapable of useful analytics over massive data sets.

I was Tumblr's fourth full-time software engineering hire. When I joined (nearly 4 years after the company was founded) the only thing remotely related to analytics was a tiny Hadoop cluster, where logs were dumped and largely ignored. Nothing about analytics is "in their blood". All you needed to sign up for Tumblr was an email address. WTF do you even think they are "analyzing"? Your comments are completely fabricated BS.

> You could reduce your tracking data to aggregates

Once again, I'm not describing "tracking data"! I'm talking about things like content that users have posted, comments they have written, content they have favorited, users they are following. These are core data models of a social network. It has nothing to do with tracking or analytics.

> You don't need to keep that data forever.

The OLTP product data I'm describing does need to be kept forever. Users don't like it when content they have written on their blog suddenly disappears.

> Nowadays a single postgres instance will handle billions of rows without breaking a sweat, and social media content is exceptionally shardable.

Yes, but running a massive cluster of hundreds or thousands of sharded database servers is still very expensive.

throwaway22032 · on May 18, 2024

I think that what they're getting at is that in the Usenet days half of what you've mentioned would be local data.

There is no central concept of "content that have favourited" or "users they are following", that's all handled locally in that model.

evanelias · on May 18, 2024

> half of what you've mentioned would be local data

Not by volume. Posts and comments make up the vast majority of the storage requirements, and none of that can be purely client-side.

> There is no central concept of "content that have favourited" or "users they are following"

I'm aware, I used Usenet quite a bit in the 90s, as well as dial-up BBSs.

Usenet is a distributed forum / discussion board, which is related but not equivalent to the core functionality of social media applications being discussed here.

With Usenet's model, there's no concept of a profile aggregating content from a single user. This means you simply cannot replicate the primary experience of Facebook, Twitter, Instagram, Tumblr, Pinterest, DeviantArt, MySpace, Friendster, or any other social media site/app with Usenet's approach. Nor can it reproduce the experience of even modern forums like HN or Reddit.

Usenet also didn't actually scale massively. Every estimate I've seen of the peak Usenet userbase puts it at a tiny fraction of modern social media.

In any case, Usenet essentially failed. We already have empirical evidence about how these ideas play out! Why are we even seriously discussing this?

giantrobot · on May 17, 2024

> Check out mailing lists or usenet.

These aren't really good "social media" examples. Both mailing lists and Usenet have limited retention, with mailing lists there may be almost no retention beyond the amount required to deliver a message.

While low retention might be a desirable feature and something you might actually want in a FOSS social network, it means old content will disappear from the central server. If it's not archived by clients it can easily disappear or end up locked away only in private backups. Google's buyout of Deja News should be a cautionary tale of retention and the locking up of public data behind a private gate.

Usenet history today is largely only available because someone at Google hasn't noticed Google Groups still exists and terminated it yet. If that happens tomorrow there's not any good complete archive of historical Usenet content. There's no guarantee Google won't kill those Usenet archives in the next year let alone the next five years.