> They should be random, but roughly-ordered over time so that your MySQL/Postgres indices stay fast and efficient as the dataset grows.
I'm currently facing an issue where the chosen uuid format was v4. From my understanding this is the least preferable because it's true random, which causes hell for mysql indices.
What I don't quite udnerstand however, is why is sorted random better if I'm using an index in both cases? I'd like to know what I should be doing and if there is a way to move/migrate if possible
As a rule of thumb, a table that grows over time probably has a "creation time" field, and a query that touches one row probably touches other rows created around the same time (like a weekly report). Since rows are probably stored on disk in chronological order, you can index the "creation time" field and do a good job, but if your primary key is already roughly chronological you may be able to get away with one fewer index and keep inserts/updates cheaper.
In MySQL and some other database systems, primary keys are clustered indexes. A clustered index defines the order in which data is physically stored on disk.
If you have an auto-incrementing primary key like most sane people do, new rows will be sorted to the very end of the table. Inserting data to this table essentially becomes an append-only operation, which is very fast. On the other hand, inserting random values all over the place is like inserting data in the middle of a file and pushing back everything that comes afterward, over and over again. Even on modern SSD's, this is much slower than simply appending data to the end of a file.
For best performance, your primary keys should be as close as possible to auto-incrementing integers. In other words, each new value should be greater than all the values that currently exist in the table. Timestamps are quite useful for this purpose.
> What I don't quite udnerstand however, is why is sorted random better if I'm using an index in both cases?
Data locality. The default index in most RDBMS is a btree which is ordered and clustered so when you look up items in an index if they sort similarly they’ll be in the same index page(s) and thus cheaper to retrieve. For PKs that means when you go through your records it’s cheap to get siblings.
When the ids are randomly distributed it’s way more expensive to add them because they get added at random positions (writes are scattered which is bad) and when you retrieve them you have to jump through the index at random loading a bunch of pages to get just one or two records from it.
Things are even worse if the table itself is clustered after the PK because now your scattered index is also ducking with your actual data reads.
The indexes advantages would be the same when you think of searching. But for inserting (and deleting) it is much faster if they are already orderded. You don't need (or well, the DB engine) to traverse any trees or anything. If they are 100% ordered, inserting new ones to that index would be just add something at the end of a list, much faster.
> They should be random, but roughly-ordered over time so that your MySQL/Postgres indices stay fast and efficient as the dataset grows.
I'm currently facing an issue where the chosen uuid format was v4. From my understanding this is the least preferable because it's true random, which causes hell for mysql indices.
What I don't quite udnerstand however, is why is sorted random better if I'm using an index in both cases? I'd like to know what I should be doing and if there is a way to move/migrate if possible