You have to be careful with the approach of using Postgres for everything. The way it locks tables and rows and the serialization levels it guarantees are not immediately obvious to a lot of folks and can become a serious bottle-neck for performance-sensitive workloads.
I've been a happy Postgres user for several decades. Postgres can do a lot! But like anything, don't rely on maxims to do your engineering for you.
Yes, performance can be a big issue with postgres. And vertical scaling can really put a damper on things when you have a major traffic hit. Using it for kafka is misunderstanding the one of the great uses of kafka which is to help deal with traffic bursts. All of a sudden your postgres server is overwhelmed and the kafka server would be fine.
It's worth noting that Oracle has solved this problem. It has horizontal multi-master scalability (not sharded) and a queue subsystem called TxEQ which scales like Kafka does, but it's also got the features of a normal MQ broker. You can dequeue a message into a transaction, update tables in that same transaction, then commit to remove the message from the queue permanently. You can dequeue by predicate, delay messages, use producer/consumer patterns etc. It's quite flexible. The queues can be accessed via SQL stored procs, or client driver APIs, or it implements a Kafka compatible API now too I think.
If you rent a cloud DB then it can scale elastically which can make this cheaper than Postgres, believe it or not. Cloud databases are sold at the price the market will bear not the cost of inputs+margin, so you can end up paying for Postgres as much as you would for an Oracle DB whilst getting far fewer features and less scalability.
Source: recently joined the DB team at Oracle, was surprised to learn how much it can do.
Agree but we really have to put a number on baseline traffic and max traffic burst in order to be productive in the discussion. I would argue that the majority of use cases never need to be designed for a max-traffic-number that PG can't handle
Definitely, this is also one of the direction Rails is heading[1]: provide a basis setup most of the people can use out of the box. And if needed you can always plug in more "mature" solutions afterwards.
I wish postgres would add a durable queue like data structure. But trying to make a durable queue that can scale beyond what a simple redis instance can do starts to run into problems quickly.
It would probably work fine, it would also put the jobs at risk of people who managed to convince their enterprises that a dumb but fast server (Kafka) was actually a good idea.
This is true of any data storage. You have to understand the concurrency model and assumptions, and know where bottlenecks can happen. Even among relational databases there are significant differences.
Exactly. Or worse, you treat one as a straightforward black box swap in replacement for another. If you're looking to scale, you _will_ need to code to the idiosyncraties of your chosen solution.
Postgres doesnt scale into oblivion, but it can take some serious chunks of data once you start batching and making sure a every operation only touches single row with no transactions needed.
Nearly true, but you dont need to run a cassandra cluster to ship your 3k msg/sec and you can take smaller locks if you have a small number of senders that delete sent messages and send in chunks
When people say "just use postgres" it's because their immediate need is so low that this doesn't matter.
And the thing is, a server from 10 years ago running postgres (with a backup) is enough for most applications to handle thousands of simultaneous users. Without even going into the kinds of optimization you are talking about. Adding ops complexity for the sake of scale on the exploratory phase of a product is a really bad idea when there's an alternative out there that can carry you until you have fit some market. (And for some markets, that's enough forever.)
You would typically want to use the same database instance for your queue as long as you can get away with it because then transaction handling is trivial. As soon as you move the queue somewhere else you need to carefully think about how you'll deal with transactionality.
Yes, I often use PG for queues on the same instance. Most of the time you dont see any negative effects. For a new project with barely any users it doesn’t matter.
True, but you have to have a really intensive workload to hit its limits; something in the order of tens of thousands writes per second; and even then, you can shard to a few instances. So yes, there is a limit - but in practice, not for most systems
I've been a happy Postgres user for several decades. Postgres can do a lot! But like anything, don't rely on maxims to do your engineering for you.