Welcome Stranger to OCC!Login | Register

A Look at RX Vega 64 Efficiency

» Discuss this article (5)

Killing Floor 2 Results:

I know I just said the results for Killing Floor 2 are weird, but I do not want to lead with the weirdness. I do want to mention here that the game originally released to Steam Early Access in 2015, which means two things are true about it. One is it will not have modern, hardware-pushing graphics and that it should be just about as optimized as it will ever be, both from its own patches and drivers. These points I believe factor into the weirdness, but that will be later.

For the Killing Floor 2 runs I played for five minutes in the Outpost map, on Normal difficulty, Short length, Survival mode. I selected Outpost because I am very comfortable on the map, plus it has both interior and exterior locations. I started the recordings when I spawned into the map and played for five minutes. Between the waves I used the Skip Trader button to advance to the next wave sooner, but this button is in the escape menu, which may, momentarily, disrupt the performance. Like before, I have a recording of an example run, though I did not record any measurements for this:









The first set of graphs I want to show you are the average clock speed ones, which do have some interesting features to them.



Similar to what we have already seen, the two undervolted configurations are able to reach higher clock speeds, but there are a few runs where Stock actually clocked higher, on average. These runs might just be anomalies. I looked ahead at the average fan speed graphs and it does appear Stock may have been thermally throttling at the higher frame rates.

None of that might be all that interesting, but the average clock speeds for the 60 FPS through 100 FPS runs are interesting, especially in the Stock set. For Stock, the average clock speeds for those runs are just below the P2 clock speed of 1084 MHz, except for the 100 FPS run that is 1088 MHz, but remember that is an average. For the UV and UV +50 runs the clock speeds are just above the 1084 MHz level. Neither of the other games were reaching down to this low a clock speed, and certainly not into the P1-P2 range.



Looking at the average fan speed now, we see something rather interesting when comparing Stock to the other runs. For the 140 FPS up through the Default run, Stock was running the fan faster, and for many of those runs it was also maxed out at the 2400 RPM limit, so there was likely thermal throttling in play. At 130 FPS, the average fan speed for Stock was lower. At 120 FPS Stock is the same as UV but lower than UV +50, and then everything from 110 FPS and down Stock was running the fan slower. This indicates there was actually less heat being produced with the Stock configuration than the undervolt. But that would also mean the power usage was less, right?



I said there would be weirdness, and I meant it. For Killing Floor 2, Stock settings used less power for the 60 FPS through 130 FPS runs I did, compared to both the UV and UV +50 runs. I was so surprised by this that I actually redid all of the Stock runs, and the power usage was actually a little less. (I am using this second run for the data, by the way.) Just look at the coloring of the columns and the gradient I configured. The Stock 60 FPS used so little power, it is not even mapped on the gradient anymore. Both undervolted configurations actually used more power, despite being undervolted.





Looking at the plot of clock speed and power use we see the extremes of the Stock configuration, with it reaching lower than the undervolted configurations, but also higher with some runs hitting the 225 W limit. While the UV and UV +50 sets do not drop to as low of power consumption as Stock does, they also do not reach as high, so we do see the undervolt having an impact, but this does not answer the question of why the lower frame rate targets used less power on Stock.

Before getting to my theory, I want to throw in these graphs that I have for all of the sets, but have not shown before. They are QQ plots of the power usage, so you can see the distribution of ASIC Power measurements for the runs.



These graphs show how the Stock runs are more spread out in how much power the ASIC used each second than the two undervolt configurations. We also see that at the high end, Stock not only was using more power but was consistently using more, with the dots almost painting a flat line.

Now, what is my theory to explain this weirdness? For that I want to show you another set of graphs, and some data-filled tables I am going to hate having to type out.



Through the entire article I have only been talking about the GPU, completely neglecting to mention the HBM2. Partly that was because I wanted to save it until now, but also because I was not thinking about it much when collecting and processing the data. On a whim I did, before reaching this conundrum, figure out how to generate tables to show how much time the GPU was spending in a P-state and set it up to do the same for the HBM2, but besides that, I barely thought about it. Now looking at these graphs and at these tables, I think I have an explanation for what is happening and why the power usage was so much lower.


HBM - Percentage Spent at P-states 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Default
P0 - 167 MHz 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
P1 - 500 MHz 25 33 25 21 15 14 12 12 7 8 5 4 2 1 4 1
P2 - 800 MHz 75 67 75 79 83 72 43 44 10 3 1 4 1 2 1 4
P3 - 945 MHz 0 0 0 0 2 14 45 45 84 89 94 92 97 97 95 95


HBM - Percentage Spent at P-states 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Default
P0 - 167 MHz 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
P1 - 500 MHz 33 33 19 18 18 11 11 10 14 9 6 5 5 6 2 1
P2 - 800 MHz 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
P3 - 945 MHz 66 67 81 82 82 89 89 90 86 91 94 95 95 94 98 99

UV +50:

HBM - Percentage Spent at P-states 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Default
P0 - 167 MHz 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
P1 - 500 MHz 22 34 29 9 21 8 10 10 9 7 2 5 2 2 2 3
P2 - 800 MHz 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
P3 - 945 MHz 78 66 71 91 79 92 90 91 91 93 98 95 98 98 98 97


Unlike the GPU, the frequency of the HBM is fixed to its P-states, which are 167 MHz (P0), 500 MHz (P1), 800 MHz (P2), and 945 MHz (P3). When you are playing a game and letting it render as fast as your hardware and the engine allows, the memory is going to be in that P3 state for maximum performance. That is not what we are doing here, with FRTC limiting performance, and therefore encouraging a different, lower P-state be used. Starting with the 130 FPS runs, the HBM2 was in either its P1 or P2 state more often than in the P3 state, and if you go back up to look at the ASIC Power graph you see these are also the runs where Stock used less power.

Based on this we can clearly see the HBM2 P-state is contributing to the drop in power, but why so much a drop? This is where my theory gets a bit more complicated, and could also be wrong. Unfortunately I do not have the background to claim this is an educated theory/guess, but there is a kind of logic to it in my head.

If you go back, way back, in this article you will find where I mentioned that the Memory P3 Voltage setting actually acts as a floor voltage, so the voltage to the GPU cannot be less than this. At around the same place I also said my setting of the P6 and P7 to below this floor voltage might come into play for Killing Floor 2. My theory is that the floor voltage follows the memory P-states, so when the HBM enters its P2 state, the floor voltage for the GPU also drops. If the voltage to the GPU was only being kept as high as it was because of this floor voltage, as might happen when limiting the frame rate, then this drop in the floor voltage would result in a drop of voltage across the GPU. Not only is the HBM2 now using less power, thanks to its lower P-state, but the GPU core will also use less power because its voltage is not being unnecessarily inflated.

If you look back to the Clock vs Power graphs for Killing Floor 2 you will notice the regression line and the placement of the bins is more diagonal for the Stock set than the UV and UV +50 sets, which have a nearly-flat section to them. This makes sense if the GPU voltage is now free to increase as it needs to along with the clock speed, while it was/is not with the floor voltage being too high.

It should also be mentioned that FRTC delays when the GPU starts drawing a frame, which means that even though the GPU is running at some frequency, there is a 'quiet' time when it is not doing work. This will reduce the amount of power used when the GPU is at the same frequency and would explain the power drop for the UV and UV +50 sets, as they also have a number of runs at the same average GPU clock speed, but with less power being used. Based on these two sets, we can see it cannot explain the still lower power use of Stock.

Why is the HBM2 not entering a lower P-state for the undervolted configurations? I am not sure but my guess would be it considers it easy enough to enter and stay in the P3 state that it does not want to drop to a lower, yet still viable P-state. For reference, you can view the voltages for the different P-states (of both the GPU and memory) by saving a Wattman profile and looking at the XML file it produces. The stock voltage of the HBM P2 state is 950 mV but my undervolt sets the P3 state to use 965 mV. That might be close enough that depending on the logic between when to enter certain P-states, the driver/BIOS is deciding to go for the higher speed because there is plenty of power available to do so.

Now here is something else that is kind of funny, if you look back at the ASIC power graphs for the other games, you will notice there were runs between those sets as well where Stock used less power than the UV and UV +50 configurations. For those games it was close enough to consider margin of error, but with Killing Floor 2, the difference is too great to ignore. Also as I look at the P-state tables for Shadow of War and The New Colossus I can see the same behavior with Stock; dropping from the HBM P3 state more often than the undervolted configurations. The resulting difference was not as profound, but it was present. Curiously, the undervolted configurations only had the HBM2 enter either P1 or P3, ignoring P2, though this might support my conjecture that my P3 voltage is too close to the P2 voltage for this lower state to be considered worth entering. (Sadly this would then suggest both that there is efficiency left on the table when undervolting and that the voltages of the lower P-states are not reduced when the only available P-state is changed. Now, testing the stability of a P2 or P1 undervolt might not be easy, but then we can set the maximum state in Wattman, so that could allow it. Perhaps I will try to do that now…)

Oh, and for those that want them, the frame time and display time graphs:





I think no one is surprised to see the frame rate targets being consistently hit here. Here are the course graphs, just the same:




  1. RX Vega 64 Efficiency - Introduction
  2. RX Vega 64 Efficiency - Procedure
  3. RX Vega 64 Efficiency - Wolfenstein 2: The New Colossus Results
  4. RX Vega 64 Efficiency - Middle-erath: Shadow of War Results
  5. RX Vega 64 Efficiency - Killing Floor 2 Results
  6. RX Vega 64 Efficiency - All Graphs and P-State Tables Part 1
  7. RX Vega 64 Efficiency - Conclusion
Related Products
Random Pic
© 2001-2018 Overclockers Club ® Privacy Policy
Elapsed: 0.1104259491   (xlweb1)