Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> (ETL) from mongoDB

Why not query the data directly in MongoDB?



Because queries, specifically those that aggregate, consume memory and CPU on the live prod db. Something a simple scan cursor doesn't do. If the resource consumption is prohibitive, which it often is in mongo, and your use case is non-realtime, it's typically better to script the aggregation outside the DB query (or query an ETL'd aggregation store that doesn't impact customers when you lock it up)

Edit: changed "offline" to "non-realtime"


You should at the very least be doing analytics queries on a replica, or you could be affecting the database performance (and the customer experience) in production.

But even if you did that, you'll find that you'll need joins and aggregations that are painful to do in Mongo yet trivial to do in a system that is designed for them.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: