Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not everyone runs some PHP damaged setup where connection pooling has to be pushed out to a proxy. Some people use sane applications in the first place. There is no reason for those people to add an extra layer of latency and potential failure to achieve what they already have.


True - it's only application insanity preventing a distributed collection of database clients from properly coordinating the proper allocation of postgres connections amongst themselves.

Now true, in the event of a network partition, you lose the ability to open a database connection. But that's a small price to pay, right?


I guess this is sarcasm, but it's not clear what you're trying to actually say. How is a connection-pooling proxy supposed to make an application more resilient to network partitions? If anything, it's worse, since you have twice as many points of failure -- if either the application can't talk to the proxy, or the proxy can't talk to the database, you're hosed.


Practically speaking, you typically run pgbouncer on the same box as postgres.

Even if you ran pgbouncer on a different box, the situation is still better. Suppose you have a probability P of having a partition between any two boxes. Then the probability of a partition taking down the system with pgbouncer is P.

If you instead had N clients coordinating among themselves how many connections to use, your odds go way up. The probability of a partition existing between two clients is 1-(1-P)^(N(N-1)/2) ~= N(N-1)P/2 (for very small P). Once two clients can't communicate, neither one can connect to the DB (because they don't know if the other two have pushed the db over the limit). All clients will need to stop connecting to the DB if any client is ever disconnected from the other N-1 of them.

In a distributed system, ensuring that SUM(x[i]) remains bounded for all time is a tricky problem to solve.


> Practically speaking, you typically run pgbouncer on the same box as postgres.

I've done this, long-term, in production. This is not how PGbouncer is designed to be used, and it causes serious problems under load. (I am still doing this, but I have it on my todo list to remove this glitch from my architecture, as it caused my entire site infrastructure to fail once during a launch.) Put simply: PGbouncer is single-threaded; it is not difficult to end up in situations where the one core PGbouncer is running on to be at 100% while the remaining cores (in my case, seven) running PostgreSQL are having no issue with the traffic.


Just curious, how does that happen? I've never found myself in a situation where PGBouncer can't handle the load but Postgres can.

I'm guessing it's a situation where you are running a large volume of very easy queries, for example primary key lookups with no joins?


We do have this problem! It is only a matter of how small the PGBouncer machine is compared to the PostgreSQL. In our case the PGBouncer had to deal with 16000 req/s, coming from 700 clients going through 100 connections to the PostgreSQL. We now have 4 PGBouncers for this DB, and we're 'load balancing' them with DNS.


I can't tell if everybody does the same, but I simply enable enough connections for the total of all clients, and am done with that.

Clients talking between themselves to establish if they can create another connection or not is madness. They'll keep their connections in the maximum all the time anyway.


Or you could give each client C/N connections.


This can easily result in too few connections at a local peak. If each box is allocated 3 connections (and postgres can handle 12), you get high latency in the event that your load looks like [6, 1, 1, 1]. This happens occasionally due to random chance.


Assuming multiple application servers despite 99.9999% websites not needing them doesn't make much sense.


there are a number of applications that are sane and dont have a postgresql connection pooling built-in (eg. django introduced it on 1.6)


Django connections to the database are bounded by the number of threads and processess you set.

It didn't have a pool, but it's a mistake to think it just oppened connections at will.


Django requires multiple processes, which makes sharing things like connections quite difficult. That is not sane.


whats your alternative to that?


A language without a global lock to make threads useless.


You have taken a FUD sound bite and repeated it without understanding. The language is not at issue here. CPython's GIL does not make threads useless at all, and there are already Python interpreters without a GIL anyway.

An addiction to globally shared state across threads multiplies race conditions and scaling problems, even if the GIL magically goes away. If you are incapable of using even processes effectively then I really fear for you when you have to scale out horizontally.

I'd say this more nicely if I could think of any way, but you don't know what you're talking about and you should read up before you begin talking about it again. It looks bad and you might mislead someone.


No, you have taken a "stick my head in the sand and pretend the last 30 years of progress didn't happen" and repeating it without understanding. I do web development in haskell. I have thousands of concurrent connections to a single process, which is using 32 cores without problems. There has not been a single race condition, deadlock, mutex bottleneck, etc, etc. Just because you are happy with an absurdly primitive language, doesn't mean those of us in the 21st century are ignorant.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: