>The marketing department picks the number they want based on how they want customers to think about the chip, and which competitors they want you to compare it against. They just plug in whatever numbers are needed into the formula so that the number comes out how they want it.
Are you just describing product segmentation? ie. how the ryzen 5700x and 5800x are basically the same chip, down to the number of enabled cores, except for clocks and power limit ("TDP")?
Yep. The 5800X is a higher bin specifically because it can clock higher than the ones in the 5700X bin. That certainly makes them draw more power, so they give them a higher TDP number too. But the TDP doesn’t have anything to do with how much power the cpu will draw or how much heat it will generate in practice. Those numbers vary quite a lot; the CPU continuously adjusts it’s own frequency multiplier based on it’s own measured temperature, meaning it’ll draw more power if you cool it better.
>But the TDP doesn’t have anything to do with how much power the cpu will draw or how much heat it will generate in practice. Those numbers vary quite a lot; the CPU continuously adjusts it’s own frequency multiplier based on it’s own measured temperature, meaning it’ll draw more power if you cool it better.
I don't get it, are you referring to the phenomenon that different workloads have different power consumption (eg. a bunch of AVX512 floating point operations vs a bunch of NOPs), therefore TDP is totally made up? I agree that there's a lot of factors that impact power usage, and CPUs aren't like a space heater where if you let it run at full blast it'll always consume the TDP specified, but that doesn't mean TDP numbers are made up. They still vaguely approximate power usage under some synthetic test conditions, or at the very least is vaguely correlated to some limit of the CPU (eg. PPT limit on AMD platforms).
No, the TDP number doesn’t even vaguely approximate anything. You can’t use the number to predict anything, or to plan, or to estimate your electric bill, or anything like that.
Isn't TDP supposed to be an upper bound of how much power budget there is for a chip when it's running under maximum IPC (which implies AVX512 workload spread across all cores with all the test data in all the L1 caches)? I guess that power budget can vary due to process imperfections and/or CPU bugs but saying that it doesn't approximate anything is hard to believe. How about the PSU then, e.g. is 800W PSU a made-up number as well?
No, TDP is only supposed to be a marketing number. It would be nice if it were a real number that meant something, but CPU manufacturers don’t want to include really complicated information in their marketing. When they want to emphasize that a processor is powerful, they increase the TDP number! When they want you to buy an efficient laptop, they just lower the number! Same cpu, same number of transistors, same number of cores and PCIe lanes, different model number, different TDP number.
The power ratings of power supplies, on the other hand, are perfectly valid. Try to draw more than that and they will blow a fuse. Note however that a power supply’s efficiency is nonlinear. If your computer is really drawing 800W from the power supply, then the power supply is probably drawing 1000W from the wall, or maybe more. The difference is converted into heat during the conversion from 120V AC to 12V DC (and 5V DC and 3.3V DC, etc, etc). That’s an efficiency of 80%. But if your PC was drawing 400W from the same power supply then maybe the efficiency would be 92% instead, and the supply would only draw 435W from the wall. The right power supply for your computer is the cheapest one that is most efficient at the level of power that your computer actually needs. The Bronze/Gold/Platinum efficiency ratings are almost BS made–up marketing things though, because all that tells you is that it hits a certain efficiency rating at _some_ power level, not that it does so at the power level you’ll typically run your computer at.
There is a similar but more extreme set of nonlinearities when talking about the power drawn by a CPU (or a GPU). The CPU monitors its own temperature and then raises or lowers its own frequency multiplier in response to those temperature changes. This means that the same CPU will draw more power and run faster when you cool it better, and will run more slowly and generate less heat when the ambient temperature is too high. There are also timers involved. Because so many of the tasks we actually give to our CPUs are bursty, CPU performance is also bursty. The CPU will run at a high speed for a short period of time, then automatically scale back after a few seconds. The exact length of that timer can be adjusted by the BIOS, so laptop motherboards turn the timer down really short (because cooling in a laptop is terrible), while gamer motherboards turn them way up (because gamers buy overbuilt Noctua coolers, or water cooling, or whatever). Intel and AMD cannot even tell you a single number that encompasses all of these factors. Thus TDP became entirely meaningless and subject to the whims of marketing.
https://www.usenix.org/system/files/conference/cooldc16/cool... uses 84W TDP-rated Haswell-architecture Intel i7-4770, and authors constructed the synthetic microbenchmark with 1.67 IPC FPU and 3.86 IPC integer workloads. Then they use RAPL (Running Average Power Limit), something I learned that exists as of today, to measure the power usage on the level of the whole chip package. Reported numbers are ~22W.
Considering that the microbenchmark is utilizing only one core, and considering that this chip has 4 cores in total, could it really be that they would measure ~84-88W if they had designed the microbenchmark so that it utilizes all of the cores? This would then match the declared TDP.
They didn’t measure 22W, they measured 6W + 22.1W + 4.9W + 1.8W + 4.8W + 11.2W = 50.8W. Add in 66.3W for the other three cores and that would be 117.1W. Benchmark #2 measured a few watts less than that.
But they don’t give the IHS temperature so you could repeat the exact same experiment using the same hardware and get different numbers simply because your cooling setup was better or worse than theirs.
My understanding, and per Intel documentation, is that RAPL is giving them power consumption over the whole package therefore I believe 22W for Cores (W) in their figure is correct? Other figures such as instruction decoder they seem to extrapolate from that figure since RAPL doesn't and can't give information on that level of granulation? I could be wrong but that's how I interpret their data and why I think the date is not to be accumulated together.
As per cooling setup, I think I agree. This is something that I didn't know but it makes sense.
Right, RAPL just reports a total power usage figure for the whole CPU. The authors then develop a model which they believe splits that total into multiple components that correspond to parts of the CPU. This is possible because CPUs provide performance counters that measure what the CPU is actually doing. For example if you write programs that are very similar but have different ratios of cache hits and misses then they’ll draw different amounts of power. You can use those differences to devise a formula for the amount of power used by the cache.
And indeed, they give their formula in section 4.2:
You can see that the power used by the whole package is the sum of six terms. The values that the calculated for those six terms for each of their benchmarks are given in table 4. The 22W figure for the core(s) is just based on the frequency the CPU is running at.
Are you just describing product segmentation? ie. how the ryzen 5700x and 5800x are basically the same chip, down to the number of enabled cores, except for clocks and power limit ("TDP")?