More

bluestreak · on Dec 20, 2023

this is THE number one question being asked about QuestDB :) There is a thread that might help https://news.ycombinator.com/item?id=37557880

gavinray · on Dec 20, 2023

In the most non-inflammatory way possible: I am not sure I'm convinced of the performance benefits of crossing a JNI boundary to invoke a method in another language, versus letting the JVM JIT and optimize native JVM code.

Would be interesting to see benchmarks here, do you know if there are any public ones available, specific to QuestDB's interop code?

bluestreak · on Dec 20, 2023

In certain situations, crossing the JNI boundary can be advantageous. When data resides in "native" memory, outside the Java heap, the coordination of concurrent execution logic can be handled in Java, while the actual execution occurs in C++ or Rust. In this context, the negligible penalty for crossing the JNI boundary once per 10 million rows pales in comparison to the substantial benefits achievable through C++ optimization or CPU-specific assembly.

pixelpoet · on Dec 20, 2023

I see... but that seems a little weak considering it's a funded product, the first adjective they use to describe it is "fast", and good old C++ would totally slay it. The author has a C++ background, maybe he could spend an afternoon trying that?

I couldn't even try to count the number of great posts I've read about fast hash tables from e.g. Paul Khuong alone...

bluestreak · on Sept 20, 2023

One of our distribution channels is Maven Central where we ship Java 11 compatible library. Embedded users preclude us from leveraging latest Java features.

bluestreak · on Sept 20, 2023

Java tooling was excellent back when QuestDB was started and still is excellent today compared to C++.

bluestreak · on May 2, 2023

It is not always read-write-modify. There is no evidence of this pattern in Ubuntu when there is no memory pressure. Merge occurs when block is partially updated after kenel had lost state of the block, which can happen under memory pressure.

amluto · on May 3, 2023

Sorry, try again.

On x86, and I think every architecture, when you write to a memory mapping that is not already backed by a writable page, the kernel is notified that user code is trying to write. And the kernel needs to fill in the contents of the page, which requires a read if the page isn’t already loaded.

It has to be this way! The write could be a read-modify-write instruction. Or it could be a plain store, but I’ve never heard of hardware with write-only memory with fine enough granularity to make this work.

The sole exception is if the page in question is all zeros and the kernel can know this without needing to read the file. This might sometimes be the case for an append-only database. I don’t know exactly what QuestDB does.

Also:

> As soon as you mmap a file, the kernel allocates page table entries (PTEs) for the virtual memory to reserve an address range for your file,

Nope. It just makes a record of the existence of the mapping. This is called a VMA in Linux. No PTEs are created unless you set MAP_POPULATE.

> but it doesn't read the file contents at this point. The actual data is read into the page when you access the allocated memory, i.e. start reading (LOAD instruction in x86) or writing (STORE instruction in x86) the memory.

What are these LOAD and STORE instructions in x86? There are architectures reasonably described as load-store architectures, and x86 isn’t one of them.

bluestreak · on May 3, 2023

> On x86, and I think every architecture, when you write to a memory mapping that is not already backed by a writable page, the kernel is notified that user code is trying to write. And the kernel needs to fill in the contents of the page, which requires a read if the page isn’t already loaded.

This is very true. Perhaps there wasn't enough context to what the article is describing. The read problem started to occur on database that is subject to constant write workload. Data is flowing in all the time at variable rate. Typically blocks are "hot" and being filled in fully within seconds if not millis.

Zeroing the file is an option to try. QuestDB allocates disk with `posix_fallocate()`, which doesn't have the required flags. We would need to explore `fallocate()`. Thanks.

amluto · on May 3, 2023

If you are allocating zeroed space and then writing a whole page quickly, then you may well avoid a read. And doing this under extreme memory pressure will indeed cause the page to be written to disk while only partly written by user code, and it will need to be read back in. Reducing memory pressure is always good.

I would expect quite a bit better performance if you actually write a entire pages using pwrite(2) or the io_uring equivalent, though.

Messing with fallocate on a per-page basis is asking for trouble. It changes file metadata, and I expect that to hurt performance.

bluestreak · on May 3, 2023

great, we are on the same page! `fallocate()` (or posix one) is called on large swades of file. 16MB default. Not too often to hurt performance. I wonder if zeroing the file with `fallocate()` will result in actual disk writes or is it ephemeral?

amluto · on May 3, 2023

fallocate will cause writes, possibly on a delay

saagarjha · on May 3, 2023

I’m assuming the LOAD and STORE were not actual instructions but euphemisms for the variety of operations that touch memory.

kevin_thibedeau · on May 3, 2023

> What are these LOAD and STORE instructions in x86?

The RISC core of every x86 since PPro is a load/store machine.

eru · on May 3, 2023

Those instructions still aren't exposed to the user, are they?

kevin_thibedeau · on May 3, 2023

Doesn't matter. The core doesn't see the x86 instructions either. MOVs are decoded into simpler operations to facilitate reordering.

amluto · on May 3, 2023

Good luck decoding LOCK CMPXCHG to loads and stores.

In any case, my actual objection was to “LOAD instruction in x86”. It makes no sense.

bluestreak · on March 22, 2023

I am excited for what Memory API brings. The biggest pain with native memory management for us are "struct" or lack of those. Extremely bug prone memory arithmetic I personally am very motivated to see the back of.

FFI - we need to wait and see how this actually evolves.

We are not looking at any of this just yet. While it is undoubtedly fun, there are few other things we're busy with.

bluestreak · on Nov 28, 2022

I couldn't agree more. I don't think compiler vectorization is that useful even for columnar (!) database we're building. The specialized JIT doesn't even use AVX512 because too much effort for little to no gain.

sagarm · on Nov 29, 2022

Vectorization (auto or manual) can really help optimize bottlenecks like evaluating simple comparisons once you're out of easy algorithmic wins. It takes so much attention it's only worth doing in the most critical cases IMO.

Auto-vectorization can be fragile. Manual vectorization is a ton of work and difficult to maintain.

bluestreak · on Sept 20, 2022

Thank you for sharing the post! I’m vlad, cto of questdb - we went all the way trying to optimize CSV ingestion with io_uring, and ended up benchmarking against some OLAP peers in the process, Any feedback is greatly appreciated !

bluestreak · on June 16, 2022

Our article in question can be found here: https://questdb.io/blog/2022/05/26/query-benchmark-questdb-v...

The intent of the article was to showcase JIT-optimised WHERE clause and we did not use any indexes on QuestDB.

PeterZaitsev · on June 16, 2022

If your intent it to showcase the new optimization in the product it is best to compare it to your own old version

olluk · on June 16, 2022

Comparison with old version is actually in the article for the patient reader. It could go to the top but I don't think it will make a difference. At the end of the day it is the article at the official QuestDB website which gives the reader a spoiler about the bias.

I am intrigued what Timescale is going to publish next.

PeterZaitsev · on June 17, 2022

Right. The data is there but it comes with clickbait title which distracts focus from the awesome performance improvements QuestDB guys reached

qoega · on June 16, 2022

Agree. And for a blog post it can even have a story like: "We compared with ClickHouse and we were 10x slower, than we looked at this case and made it 100x faster. Thank you, benchmark and ClickHouse developers that showed us use case where we could do better."

For me benchmarking is usual - "Why this query takes so long? We need to improve it. Sometimes 1000x times."

untitaker_ · on June 16, 2022

Right? How do the folks at QuestDB know that their new JIT engine is actually responsible for those performance improvements? My understanding is that, index or not, data is still sorted by time in questdb, which is exactly what the ClickHouse engineers are replicating in the new schema.

bluestreak · on June 16, 2022

The query Clickhouse picked on does not actually leverage time order. Perhaps clickhouse vendors on this thread can comment on relevance of the date partitioning for this query. My best guess is that it might help the execution logic to create data chunks for parallel scan.

QuestDB does also use partitions for this purpose but we also calculate chunks dynamically based on available CPU to distribute load across cores more evenly

untitaker_ · on June 19, 2022

That's fair enough and I get the broader point about ClickHouse being rather inflexible wrt query performance. It still seems like the initial sorting key for CH would've been the worst possible one for all benchmark scenarios.

wronglyprepaid · on June 17, 2022

> The intent of the article was to showcase JIT-optimised WHERE clause and we did not use any indexes on QuestDB.

Does not seem like it from the title.

datalopers · on June 16, 2022

[flagged]

avianlyric · on June 16, 2022

What an extremely unfair comment. Having read QuestDBs blog, it’s quite clear they’ve taken great pains to point out that a single specific benchmark isn’t the be all and end all of DB analysis.

They quite clearly start out by saying they’re only looking to demonstrate the impact of a specific new DB feature they’ve created, and are using benchmarks that illustrate the difference. They make zero claims that QuestDB is faster than Clickhouse overall, and quite carefully point out that prospective users need to run their own benchmarks on their own data to figure out what DB will work for them.

dimgl · on June 16, 2022

> They make zero claims that QuestDB is faster than Clickhouse overall

Are you sure? Just one look at their website says differently.

https://questdb.io/time-series-benchmark-suite/

I don't use these tools. I just wanted to point out that what you're saying is disingenuous.

avianlyric · on June 17, 2022

I’m commenting specifically on the blog post provided by GP, which the parent comment made some pretty derogatory comments about.

I’ve made no attempt to deeply research what QuestDB have said else where, because I don’t care. I don’t use, or have a need, for any of the products mentioned in any of the linked articles. I’m only interested in the narrow discussion of the original blog post provided by GP, to which OP post is replying to.

bluestreak · on June 16, 2022

I am in fact very proud of my team, who worked very hard on both implementation and the article. It is disappointing to read unfounded insults where we made every effort to be fair.

hodgesrm · on June 16, 2022

I appreciate your benchmark and was interested to learn about how QuestDB processes TSBS queries efficiently. I work extensively with ClickHouse and it's always enlightening to learn about how other databases achieve high performance. Your descriptions of the internals are clear and easy to follow, especially since you included comparisons with older versions of QuestDB.

That said, I think I can understand how some users might be a little put off by the comparisons. Your article effectively says "ClickHouse is really slow" without giving readers any easy way to judge what was happening under the covers. I was personally a bit frustrated not to have the time to set up TSBS and dig into what was going on. I therefore appreciated Geoff's effort look up the results and show that the default index choices didn't make a lot of sense for this particular case. That does not detract from QuestDB's performance at least from my perspective.

Anyway congratulations on the performance improvement. As a famous character in Star Wars said, "we will watch your career with great interest."

edit: correct typo

PeterZaitsev · on June 16, 2022

I wonder what "every effort to be fair" means ? The first thing you could have done is reach out to ClickHouse Community to ask for optimization suggestions

bluestreak · on June 16, 2022

"fair" means that we comparing apples to apples. Ad-hoc, unindexed predicate, compiled by QuestDB into AVX2 assembly (using AsmJIT) vs same predicate complied by Clickhouse (I'm assuming by LLVM). One can perhaps view this as comparing SIMD-based scans from both databases. Perhaps we generate better assembly, which incidentally offers better IO.

We all understand that creating very specific index might improve specific query performance. Great, Clickhouse geared the entire table storage model to be ultra specific for latitude search. What if you search by longitude, or other column? Back to the beginning.

JIT-compiled predicates offer arbitrary query optimisation with zero impact on ingestion. This is sometimes useful.

What would you offer assuming that we reached out, other than creating an index?

Clickhouse does better than we do in other areas. It JITs more complicated expressions, such as some date functions. It optimises count() queries specifically. For example we collect "found" rowed_ids in an array. Clickhouse does not specifically for count(). We still have work to do. On other hand we ingested this very dataset about 5x quicker than clickhouse, which we left out because article is not about "QuestDB is faster than Clickhouse"

est · on June 17, 2022

The million dollar question is, if we add the same index optimization to QuestDB, would QuestDB query faster than Clickhouse?

olluk · on June 16, 2022

What if the purpose of the article is to compare queries without indexes?

jsnell · on June 16, 2022

Doesn't matter, since that clearly wasn't the purpose of the article. After all, they were totally happy to add an index for another competing DB as long as they happened to win that comparison. Then they crow about how they beat having an index.

Pretty sleazy.

xenator · on June 16, 2022

So, maybe do not create specific scenarios for corner cases and then generalize outcome? And write articles about common scenarios that is important for people who will use technology on daily basis.

olluk · on June 16, 2022

My personal view is that having fast queries without indexes is quite general outcome.

Dzugaru · on June 16, 2022

"while QuestDB utilizes its full indexing strategy to read just a tiny fraction of the actual data"

Can you please elaborate on this?

bluestreak · on June 16, 2022

Full disclosure: I am CTO of QuestDB and I took part in JIT implementation. The quote above is not mine, it was written by Clickhouse staff. "utilizes its full indexing strategy" statement is false and is news to me.

Dzugaru · on June 16, 2022

So you do a full scan and it's ~50 CPU cycles per row (48 CPUs at 4 GHZ), correct? This is possible I guess? And in this case Clickhouse is wrong.

pepemon · on June 16, 2022

So, QuestDB is faster or not? I'm puzzled now!

olluk · on June 16, 2022

Looks like QuestDB is faster if you don't optimize your table storage for 1 query.

But if you are okay that only limited number of columns to be scanned faster than others ClickHouse comes first.

bluestreak · on June 3, 2022

It does result in cease and desist threats quite often. We have been on the receiving end of one.

igorlukanin · on June 3, 2022

Oh! Would love to learn more :—)

As I'm the author of the blog post in question, I can think of including your account there, if you'd like to.

bluestreak · on June 1, 2022

Aggregation is also optimised quite a bit via SIMD and map-reduce. They are as fast as the “where” predicates. Multiple field keyed aggregation is not as optimal yet. I would also suggest our demo site (free and fully open) to see how queries that you use work.

Cleanup is semi manual for now. Time partitions can be removed or detached via SQL. We’re working on automating that.

ericb · on June 2, 2022

> Time partitions can be removed or detached via SQL. We’re working on automating that.

Cool! Will that be continuous queries that can be used for downsampling?

I'm working on load testing and monitoring tools. Since either can produce enough metrics to overflow available storage, the downsampling story ends up as important as write speed for me. I imagine that's true of a lot of metric database scenarios--what happens if they go on...forever?