What can be done is part of the machine (CPU cores, interrupts) are isolated as much as possible and the latency sensitive workload load is run on that part. In that case spinning the CPU and using something like the timestamp counter (rdtsc or rdtscp assembly instruction) for timing. That will waste power and will prevent the CPU cores from going to sleep but it is possible to achieve good latencies.
As for 50-80ns access, you are right. And I would guess this workload probably represents the best case scenario and is already in cache.
As for 50-80ns access, you are right. And I would guess this workload probably represents the best case scenario and is already in cache.