198394549's comments

198394549 · on July 2, 2018

>This is not the case for most of the NoSQL databases where you'll pay for lack of certain features either by a) having to write a lot of code, or b) bad-to-crippling performance for use cases it wasn't meant to solve.

Can you give a common example of these? This article is referring to issues related to row vs column data stores, not sql vs nosql.

stickfigure · on July 2, 2018

Having implemented effectively the same customer-facing analytics problem in BQ, Keen, Mongo, and Postgres, I'll tell you specifically:

* Column stores like BQ and Keen don't let you efficiently slice and dice data by factors other than time. If you're slicing by customer or product, your queries become incredibly slow and expensive. You start writing hacky shit like figuring out when your customer's first sale was so you can narrow the time slightly, but that barely helps.

* MongoDB doesn't do joins. So you denormalize big chunks of your data, and now you have update problems because 1) you have to hunt all that down and 2) you don't have transactions that span collections. Also the aggregation language is tedious compared to SQL, requiring you to do most of the work of a query planner yourself.

* Some other person in this thread said MongoDB was faster than Postgres, but I found quite the opposite to be true. For the same real-world workload, basic aggregations on an index, we found Postgres to be much faster than Mongo. No idea what that other person is talking about.

lomnakkus · on July 5, 2018

Very well put... and this was the point I was making about "decent" performance. If you have super-special requirements (you don't), you'll probably discover it along the way to SUCCESS. If you don't any old SQL database will probably be more than sufficient AND it will be flexible enough to allow you to evolve your schema along the way.

198394549 · on July 2, 2018

What is this lack of flexibility you are speaking about? As in, actual specifics.

stickfigure · on July 2, 2018

I feel like this should be pretty obvious. I'm pretty sure there are students in a bootcamp somewhere learning "joins make it easy to construct complex queries; denormalization eliminates expensive joins but sacrifices flexibility and adds potential data inconsistency".

Real world example: Consider an Order table and a Visit table; conversion rates aggregate orders over visits. In Mongo you can denormalize some of the Visit data into Order, but what happens when you change the logic for computing conversion ratios? Or you want conversion ratios broken down by web browser, source tag, or any of the other data elements that live in Visit but you didn't denormalize ahead of time?

198394549 · on July 2, 2018

>The problem is that at some point you end up wanting joins.

It can join with the $lookup function these days. Although it is only to a "non-sharded collection". I don't know why it can't join to a sharded collection when the join is on the same shard though.

There is also the option of using $in with a list of things you have pulled down in another query.

Then there are client-side joins.

naranha · on July 2, 2018

> Then there are client-side joins.

AKA what you are doing when writing your SPAs with their own state management. Server-side joins rarely make sense in that context.

For reporting/analytics.. yes. But these can be delegated to external system/databases optimized for that task. With elasticsearch for example you get very far very quickly without the need to write any SQL joins.

dvlsg · on July 2, 2018

Would you suggest elasticsearch over sql for analytics like these? We're actually looking at a very similar situation, and I have a hard time believing aggregations in elasticsearch (especially when no full text indexes are required) are a better fit than sql. That could be my lack of experience with elastic though.

198394549 · on July 2, 2018

Why is it basically shit? It appears to store and retrieve the data as per my instructions.

nickserv · on July 2, 2018

Except when it doesn't. We've had data corruption issues related to oplog, out of sync secondaries and excessive resource usage on the primary. As far as major problems. There were also a bunch of smaller problems but in fairness those were on the nodejs/mongoose side of things. Would not recommend.