To me, the most important question is: how do I scale v7 in an environment of 20+ engineers?
When using v7, I need some sort of audit that checks in every API contract for the usage of v7 and potential information leakage.
Detecting V7 uuids in the API contract would probably require me to enforce a special key name (uuidv7 & uuid for v4) for easier audit.
Engineers will get this wrong more than once - especially in a mixed team of Jr/sr.
Also, the API contracts will look a bit inconsistent: some resources will get addressed by v7, others by v4. On top, by using v4 on certain resources, I'd leak the information that those resources addressed by v4 will contain sensitive information.
By sticking to v4, I'd have the same identifier for all resources across the API. When needed, I can expose the creation timestamp in the response separately. Audit is much simpler since the fields state explicitly what they will contain.
UUIDv4 is explicitly forbidden in some high-reliability/high-assurance environments because there is a long history of engineers using weak entropy sources to generate UUIDv4 despite the warnings to use a strong entropy source, which is only discovered when it causes bugs in production. Apparently some engineers don't understand what "strong entropy source" means.
Mixing UUID types should be detectable because type is part of the UUID. But then many companies have non-standard UUID that overwrite the type field mixed with standard UUID across their systems. In practice, you often have to treat UUID as an opaque 128-bit integer with no attached semantics.
> Detecting V7 uuids in the API contract would probably require me to enforce a special key name (uuidv7 & uuid for v4) for easier audit.
Unless I'm missing something, check it on receipt, and reject it if it doesn't match. `uuid.replace("-", "")[12]` or `uuid >> 76 & 0xf`.
Regardless of difficulty, this comes down to priorities. Potential security concerns aside (I maintain this really does not matter nearly as much as people think for the majority of companies), it's whether or not you care about performance at scale. If your table is never going to get over a few million rows, it doesn't matter. If you're going to get into the hundreds of millions, it matters a great deal, especially if you're using them as PKs, and doubly so if you're using InnoDB.
> By sticking to v4, I'd have the same identifier for all resources across the API. When needed, I can expose the creation timestamp in the response separately. Audit is much simpler since the fields state explicitly what they will contain
Good luck if you're operating at a decent scale, and need to worry about db maintenance/throughput. Ask the DBA at your company what they would prefer.
>>> Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs.
>> So this basically defeats the entire performance improvement of UUIDv7. Because anything coming from the user will need to look up a UUIDv4, which means every new row needs to create an extra random UUIDv4 which gets inserted into a second B-tree index, which recreates the very performance problem UUIDv7 is supposedly solving.
> This is only really true if leaking the creation time of the record is itself a security concern.
No, as "leaking the creation time" is not a concern when API's return resources having properties representing creation/modification timestamps.
Where exposing predictable identifiers creates a security risk, such as exposing UUIDv7 or serial[0] types used as database primary keys, is it enables attackers to be able to synthesize identifiers which match arbitrary resources much quicker than when random identifiers are employed.
With proper data permission check, having predictable ID is totally fine. And UUIDv7's random part is large enough so that it's much harder to predict than auto increment id.
If your security relies on attacker don't know your ID (you don't do proper data permission check), your security is flawed.
> With proper data permission check, having predictable ID is totally fine.
That qualification is doing a lot of work in this sentence. For supporting evidence as to why this is the case, a quick search for "CVE PHP security vulnerabilities" or "CVE NodeJS security vulnerabilities" will produce voluminous results.
> And UUIDv7's random part is large enough so that it's much harder to predict than auto increment id.
Usually. One common scenario where using UUIDv7 for primary keys in a persistent store can be exploited similar to sequential integer ID's is when there are queries supporting pagenation and/or those leveraging the temporal ordering UUIDv7 supports intrinsically. For example:
id > aSynthesizedUUIDv7Value
Note that this does not require successful identification of either the `rand_a` or `rand_b` UUIDv7 fields[0].
> If your security relies on attacker don't know your ID (you don't do proper data permission check), your security is flawed.
Again, I agree with this in theory. But as the saying[1] goes:
In Theory There Is No Difference Between Theory and
Practice, While In Practice There Is
There's 2 cases being discussed. A UUIDv7 is a bad secret, but it's fine for many other ids. If I can guess your user id, it shouldn't really matter because your business logic should prevent me from doing anything with that information. If I can guess your password reset token it's a different story because I don't need anything else beyond that token to do damage.
But the random part of a UUIDv7 is 74 bits... larger than a 64-bit integer of random values. Larger than many systems use in total when generating random keys for such things. Likely a larger number of values than the total number of comments here on HN over a couple decades. It's emphatically NOT guessable.
Which part is in violation of the age discrimination laws here, the fact that k-sortable uuids divulge the information, or the fact someone is using them to discriminate against a candidate?
If it’s the latter (which, reading wikipedias summary suggests it is), then the entire premise that k-sortable uuids are a “HR violation” is bunk.
The problem with arguing about timestamps leaking this kind of information is that _anything_ can leak this kind of vaguely dated information.
- Seen on a website that ceased to exist after 2010? Gotchya!
- Indexed by Waybackmachine? Gotchya!
- Used <different uuid scheme> for records created before 2022? Gotchya!
The only way to prevent divulging temporal clues about an entity is to never reveal its existence in any kind of correlatable way (which, as far as I’m prepared to think right now, seems to defeat the point of revealing it to a UI at all).
20 years later I submit another application to the same company, using my existing 20 years old user profile, and now get rejected because somebody figures out I'm old by looking at my user id?
Are there really any performances benefits of UUIDv7 over UUIDv4 that should ever come up in the context of an HR system? Just how many job applicants are you tracking?
I don't understand why you considered UUIDv7 in the first place.
We used to leak approximate creation time all the time back when everyone used sequential keys. If anything sequential keys are far worse: they leak the approximate number of records, make it easy to observe the rate at which new keys are created, and once you know that you can deduce the approximate creation date of any key.
UUIDv4 removes all three of those vectors. UUIDv7 still removes two of three. It doesn't leak record count or the rate at which you create them, only creation time. And you still can't guess adjacent keys. It's a pretty narrow information leakage for something you routinely reveal on purpose.
I often see sequential order IDs, and they get incremented by one, so I can guesstimate the amount of orders they get within a minute by creating my own orders. I watched this happen as I was intentionally removing and creating new orders (as they did not support modification of existing but not yet accepted ones). What may I do with this information though as an user that would be damaging? Legitimate question, intent is not harm, but I genuinely do not see how this is a bad thing.
I can see it being bad for tracking IDs, but not order IDs, unless you are allowed to view any orders that do not belong to your account, which is just fundamentally bad security and using UUIDv4 or a random string would simply be obscuring security.
UUIDv7s are much worse for creation time though imo. For sequential IDs an attacker needs to be have a lot of data to narrow the creation time. That raises the barrier of entry considerably to the point that only a committed attacker could infer the time.
With UUIDv7 the creation time is always leaked without any sampling. A casual attacker could quite easily lookup the time and become motivated in probing and linking the account further
> For sequential IDs an attacker needs to be have a lot of data to narrow the creation time.
When sequential integer ID's are externalized, an attacker does not need creation times to perform predictive attacks. All they need to do is apply deltas to known identifiers.
I remember in the cracking days, where we were trying to crack ElGamal encryption or other, we noticed when some code had been written in eg Delphi (which used a weak RNG based on datetime), then when you tried to guess when the code was compiled and the key were generated, you could get a rough timerange, and if you bruteforced through that timerange as a seed to the RNG, and tried to generate the random ElGamal key from that, you would widely reduce the range of possibilities (eg bruteforce 10M ints, instead of billions or more)
An online casino got hit a similar way a long time ago, iirc someone realised the seed for a known prng was the system clock, so you could brute force every shuffle either side of the approx time stamp and compare the results to some known cards (I.e. the ones you’d been dealt) once you had a match you knew what everyone else had.
Always thought that was elegant (the attach not using the time as the seed).
I stopped airplane maintenance software from shipping with a particularly egregious form of this for SSL session key generation. It’s hard to get a good random seed on a real time operating system. I tell you hwut.
Depending on access to sensor data, it's possible to use a mix of various sensors as well as the time for seed generation. Though baseline static from RF is better assuming that is possible as well.
Of course, it's always possible for something to do something stupid, like weak rng.
There are some practical applications that are not necessarily related to security. If you are storing something like a medical record, you don't want use it as a public ID for a patient visit, because the date is subject to HIPAA.
But they would have to relate that ID to patient data like their identity right? The date alone cannot be a HIPAA issue. That means every date is a HIPAA violation because people go to the doctor every day.
You wouldn't be publishing patient visits publically, the only folks that'd legitimatly see that record would be those which access to that visit, and they'd most likely need to know the time of said visit. This access should be controlled via AuthN, AuthZ and audited.
You'd also generally do a lot of time-based lookups on this data; what visits do I have today, this week, and so on. You might also want an additional DateTime field for timezones and offsets, but the v7 is probably better than v4 for this usecase.
Can you please give me a legitimate use case where you would have the ID of a medical case without also having the Date/Time of that corresponding record?
It's not that you can't possess the timestamp of an event. It's that you can't publish certain things that are deemed potentially identifiable.
Dates are specifically cited as potential vectors for de-anonymization. For example, you can't disclose that "Bob H presented to the clinic on October 10th" because that's a lot of information that can be used to find out who Bob H is.
Here's a practical example of what I'm talking about. Suppose you have an app for physicians that allows them to message each other to discuss a case. They can share relevant information for diagnostic purposes, e.g., "34y/o male from the southern Louisianna presented with a rash." They share de-identified photos and chat about ddx, treatment protocol, etc. All of that is cool. However, if the record of that visit is identified with a UUIDv7, and that ID is used as part of the URL you've exposed the time of the visit, and that would be a problem.
However, if your API has a (very common) createdAt field on these objects, the ability to get the creation time from the identifier is rather academic.
The concern is not limited to access of the full records. The concern extends to any incidental expression of identifiers, especially those sent via insecure side channels such as SMS or email.
In most cases this forms a compliance matter rather than an open attack vector, but it nevertheless remains that one has to answer any question along the lines "did you minimise the privacy surface?" in the negative, or at least, with a caveat.
Email is not secure but sending an email with a link to "Information about your appointment" is fine. If that link goes to `/appointments/sjdhfaskfhjaksdjf`, there is no leaked data. If it goes to `/appointments/20251017lkafjdslfjalsdkjfa`, then the link itself contains PHI.
Whether creation date is PHI…I could see the argument being yes, since it correlates to medical information (when someone sought treatment, which could be when symptoms present.)
Notably, this is an absurd argument. Every system I’ve dealt with right now sends the date/time/location/practitioner clear text in the email (or some variant thereof).
The only thing that seems to be protected is ‘reason for appointment’, and not all systems do that.
Everyone signs paperwork to authorize this when they first engage with the medical providers!
Your comment here has id 45622189 and the UI tells me in plain sight that you posted it 11h ago. Assuming the ids are sequential, these two combined tells me more about HN vs a uuid ”leaking” something that’s already expected to be public.
It's relatively common for it to be a privacy concern. Imagine if I'm making an online payment or something, and one of the IDs involved tells you exactly when I created my bank account. That's a decent proxy for my age.
1) I would argue that the year that you created your bank account is not a good proxy for age.
2) I would question where you think the uuid representing your age from your bak would leak to considering it’s still
a bank account id
3) I would question whether you consider that the vast majority of uuids aren’t used for high stakes ids such as online banking ids
A bank account number (assuming that is what are talking about, not some token) is already very sensitive information.
Like, legal status protected information.
Knowing approximate age is a relatively small leak compared to that.
bank account numbers are printed on every check you ever wrote. Most people don't write checks anymore, though online bill pay sends physical checks still sometimes. They never really were sensitive information.
Bank security does not depend on your bank account being private information. Pretty much all bank security rounds to the bank having a magic undo button, so they can undo any bad transactions after it comes to light that it was a bad transaction. Sure they do some filtering on the front-end now to eliminate the need to use the magic undo button, but that's just extra icing to keep the undo button's use to a dull roar.
It was a concern in the past, as people used password creation tools that were deterministic based on the current time.
There was previously an article linked here about recovering access to some bitcoin by feeding all possible timestamps in a date range to the password creation tool they used, and trying all of those passwords.