https://www.usenix.org/system/files/conference/cooldc16/cool... uses 84W TDP-rated Haswell-architecture Intel i7-4770, and authors constructed the synthetic microbenchmark with 1.67 IPC FPU and 3.86 IPC integer workloads. Then they use RAPL (Running Average Power Limit), something I learned that exists as of today, to measure the power usage on the level of the whole chip package. Reported numbers are ~22W.
Considering that the microbenchmark is utilizing only one core, and considering that this chip has 4 cores in total, could it really be that they would measure ~84-88W if they had designed the microbenchmark so that it utilizes all of the cores? This would then match the declared TDP.
They didn’t measure 22W, they measured 6W + 22.1W + 4.9W + 1.8W + 4.8W + 11.2W = 50.8W. Add in 66.3W for the other three cores and that would be 117.1W. Benchmark #2 measured a few watts less than that.
But they don’t give the IHS temperature so you could repeat the exact same experiment using the same hardware and get different numbers simply because your cooling setup was better or worse than theirs.
My understanding, and per Intel documentation, is that RAPL is giving them power consumption over the whole package therefore I believe 22W for Cores (W) in their figure is correct? Other figures such as instruction decoder they seem to extrapolate from that figure since RAPL doesn't and can't give information on that level of granulation? I could be wrong but that's how I interpret their data and why I think the date is not to be accumulated together.
As per cooling setup, I think I agree. This is something that I didn't know but it makes sense.
Right, RAPL just reports a total power usage figure for the whole CPU. The authors then develop a model which they believe splits that total into multiple components that correspond to parts of the CPU. This is possible because CPUs provide performance counters that measure what the CPU is actually doing. For example if you write programs that are very similar but have different ratios of cache hits and misses then they’ll draw different amounts of power. You can use those differences to devise a formula for the amount of power used by the cache.
And indeed, they give their formula in section 4.2:
You can see that the power used by the whole package is the sum of six terms. The values that the calculated for those six terms for each of their benchmarks are given in table 4. The 22W figure for the core(s) is just based on the frequency the CPU is running at.
Considering that the microbenchmark is utilizing only one core, and considering that this chip has 4 cores in total, could it really be that they would measure ~84-88W if they had designed the microbenchmark so that it utilizes all of the cores? This would then match the declared TDP.