Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Our article in question can be found here: https://questdb.io/blog/2022/05/26/query-benchmark-questdb-v...

The intent of the article was to showcase JIT-optimised WHERE clause and we did not use any indexes on QuestDB.



If your intent it to showcase the new optimization in the product it is best to compare it to your own old version


Comparison with old version is actually in the article for the patient reader. It could go to the top but I don't think it will make a difference. At the end of the day it is the article at the official QuestDB website which gives the reader a spoiler about the bias.

I am intrigued what Timescale is going to publish next.


Right. The data is there but it comes with clickbait title which distracts focus from the awesome performance improvements QuestDB guys reached


Agree. And for a blog post it can even have a story like: "We compared with ClickHouse and we were 10x slower, than we looked at this case and made it 100x faster. Thank you, benchmark and ClickHouse developers that showed us use case where we could do better."

For me benchmarking is usual - "Why this query takes so long? We need to improve it. Sometimes 1000x times."


Right? How do the folks at QuestDB know that their new JIT engine is actually responsible for those performance improvements? My understanding is that, index or not, data is still sorted by time in questdb, which is exactly what the ClickHouse engineers are replicating in the new schema.


The query Clickhouse picked on does not actually leverage time order. Perhaps clickhouse vendors on this thread can comment on relevance of the date partitioning for this query. My best guess is that it might help the execution logic to create data chunks for parallel scan.

QuestDB does also use partitions for this purpose but we also calculate chunks dynamically based on available CPU to distribute load across cores more evenly


That's fair enough and I get the broader point about ClickHouse being rather inflexible wrt query performance. It still seems like the initial sorting key for CH would've been the worst possible one for all benchmark scenarios.


> The intent of the article was to showcase JIT-optimised WHERE clause and we did not use any indexes on QuestDB.

Does not seem like it from the title.


[flagged]


What an extremely unfair comment. Having read QuestDBs blog, it’s quite clear they’ve taken great pains to point out that a single specific benchmark isn’t the be all and end all of DB analysis.

They quite clearly start out by saying they’re only looking to demonstrate the impact of a specific new DB feature they’ve created, and are using benchmarks that illustrate the difference. They make zero claims that QuestDB is faster than Clickhouse overall, and quite carefully point out that prospective users need to run their own benchmarks on their own data to figure out what DB will work for them.


> They make zero claims that QuestDB is faster than Clickhouse overall

Are you sure? Just one look at their website says differently.

https://questdb.io/time-series-benchmark-suite/

I don't use these tools. I just wanted to point out that what you're saying is disingenuous.


I’m commenting specifically on the blog post provided by GP, which the parent comment made some pretty derogatory comments about.

I’ve made no attempt to deeply research what QuestDB have said else where, because I don’t care. I don’t use, or have a need, for any of the products mentioned in any of the linked articles. I’m only interested in the narrow discussion of the original blog post provided by GP, to which OP post is replying to.


I am in fact very proud of my team, who worked very hard on both implementation and the article. It is disappointing to read unfounded insults where we made every effort to be fair.


I appreciate your benchmark and was interested to learn about how QuestDB processes TSBS queries efficiently. I work extensively with ClickHouse and it's always enlightening to learn about how other databases achieve high performance. Your descriptions of the internals are clear and easy to follow, especially since you included comparisons with older versions of QuestDB.

That said, I think I can understand how some users might be a little put off by the comparisons. Your article effectively says "ClickHouse is really slow" without giving readers any easy way to judge what was happening under the covers. I was personally a bit frustrated not to have the time to set up TSBS and dig into what was going on. I therefore appreciated Geoff's effort look up the results and show that the default index choices didn't make a lot of sense for this particular case. That does not detract from QuestDB's performance at least from my perspective.

Anyway congratulations on the performance improvement. As a famous character in Star Wars said, "we will watch your career with great interest."

edit: correct typo


I wonder what "every effort to be fair" means ? The first thing you could have done is reach out to ClickHouse Community to ask for optimization suggestions


"fair" means that we comparing apples to apples. Ad-hoc, unindexed predicate, compiled by QuestDB into AVX2 assembly (using AsmJIT) vs same predicate complied by Clickhouse (I'm assuming by LLVM). One can perhaps view this as comparing SIMD-based scans from both databases. Perhaps we generate better assembly, which incidentally offers better IO.

We all understand that creating very specific index might improve specific query performance. Great, Clickhouse geared the entire table storage model to be ultra specific for latitude search. What if you search by longitude, or other column? Back to the beginning.

JIT-compiled predicates offer arbitrary query optimisation with zero impact on ingestion. This is sometimes useful.

What would you offer assuming that we reached out, other than creating an index?

Clickhouse does better than we do in other areas. It JITs more complicated expressions, such as some date functions. It optimises count() queries specifically. For example we collect "found" rowed_ids in an array. Clickhouse does not specifically for count(). We still have work to do. On other hand we ingested this very dataset about 5x quicker than clickhouse, which we left out because article is not about "QuestDB is faster than Clickhouse"


The million dollar question is, if we add the same index optimization to QuestDB, would QuestDB query faster than Clickhouse?


What if the purpose of the article is to compare queries without indexes?


Doesn't matter, since that clearly wasn't the purpose of the article. After all, they were totally happy to add an index for another competing DB as long as they happened to win that comparison. Then they crow about how they beat having an index.

Pretty sleazy.


So, maybe do not create specific scenarios for corner cases and then generalize outcome? And write articles about common scenarios that is important for people who will use technology on daily basis.


My personal view is that having fast queries without indexes is quite general outcome.


"while QuestDB utilizes its full indexing strategy to read just a tiny fraction of the actual data"

Can you please elaborate on this?


Full disclosure: I am CTO of QuestDB and I took part in JIT implementation. The quote above is not mine, it was written by Clickhouse staff. "utilizes its full indexing strategy" statement is false and is news to me.


So you do a full scan and it's ~50 CPU cycles per row (48 CPUs at 4 GHZ), correct? This is possible I guess? And in this case Clickhouse is wrong.


So, QuestDB is faster or not? I'm puzzled now!


Looks like QuestDB is faster if you don't optimize your table storage for 1 query.

But if you are okay that only limited number of columns to be scanned faster than others ClickHouse comes first.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: