Frame Capture and Analysis Tools Review
Reviewed by: ccokeman
Reviewed on: April 1, 2013
Over the past years, evaluations of video card performance have centered around actual game play or the use of time demos. The premise is to measure how a particular video card plays through a selected game sequence, determining the performance results measured either by an in-game frame rate capture or the use of FRAPS, a popular frame rate measuring tool. The problem is that many times the actual frame rates presented just do not match up with the gaming experience reported by the masses or for that matter us as editors, especially when it comes to multiple GPU configurations. Lately there has been a tremendous amount of buzz around looking deeper into this phenomena by again using FRAPS to measure in conjunction with the reported frames per second; the length of time needed to generate each frame sent to the display.
However, you are still using a software tool that measures the FPS data output from the game engine to the DirectX layer, giving you a false sense of what is going on, which we will explain later. Measuring what actually gets to the viewer is going to give a better representation of the displayed FPS instead of taking the reported software value as the truth, especially when your lying eyes tell you so much different. It's not an entirely new thought stream but is now getting traction as a way of determining the real performance for the end user.
Recently I was introduced to a set of hardware and software tools that NVIDIA has developed and have been using in-house for a little over two years now to improve gaming performance for the end user. Up front the tools have been provided by NVIDIA with the look going forward that these tools will be adopted and expanded upon by the open source community to really dig a bit deeper into what drives gaming performance metrics. NVIDIA calls this tool FCAT, or Frame Capture Analysis Tool. Let's start with a look at the hardware requirements for capturing the data needed for the analysis.
Why do we need to look at a hardware-based capture solution when we have software solutions to do this for us? Software solutions are counting frames that are generated by the game engine rather than what actually gets delivered to the end user's display or monitor. To measure the data sent out of the DVI port requires a set of hardware tools that can capture and store the data. What we have is a multi-part solution that includes a specially built capture system, preferably Intel Z77-based; a gaming system that has no special requirements. The capture system needs to be configured with a three to four disk RAID 0 storage drive that can handle the massive data stream delivered by the capture card. A configuration setup on the Intel controller is preferred to handle data rates of up to 650MB/sec when capturing outputs up to 2560x1440. Above that the capture card cannot handle the throughput. PCIe-based disk drive is another option to handle the data. At the heart of the capture system is the DataPath Limited VisionDVI-DL capture card that captures the full resolution data at 60Hz. Splitting the display signal to feed the DataPath Limited VisionDVI-DL capture card is the job of a Gefen DVI DL Splitter that sends the display signal to the display and the capture card.
Software: The software portion of the FCAT toolbox includes proprietary and programs readily available for the end user.
- Overlay: DXFrameOverlay.dll + Enable Overlay.exe Overlays a fixed color sequence over the game while it’s running
- Extractor: Extractor.exe Output = CSV of overlay colors and scanlines
- Analysis Tools: run_doall.bat, run_nv.bat, run_amd.bat These are batch files that expedite the use of the PERL script files below.
- fcat.pl: PERL script that combines bars and identifies runts and drops , Output = CSV of HW frametimes (similar to Fraps), and CSV of Original FPS, New FPS, Runts, drops
- gen_percentiles.pl PERL script that generates the 95 and 99th frametime percentile calculations.
- pivot.pl PERL script that generates a summary CSV file (good for a pivot table).
- doall.pl PERL helper script to generate a big batch file.
In addition to the batch files and perl scripts we will be using Virtual Dub as the capture medium and utility used to play back the recorded benchmark runs to easily identify both runt and dropped frames. Datapath's Vision software is used to to verify the display output is what is being passed through to the capture card. Extractor is used to analyze the overlay data, which is output to a .CSV file for further analysis and incorporation into the output charts and graphs. Overlay is the software that provides the visual indicator for each frame during the benchmark sequence again making it easy to identify runt and dropped frames.
So why go through all the trouble of setting up this kind of test regimen and investing in the hardware and software to visualize the actual performance into such low level detail? It's about digging deeper into the root causes for the performance issues driving gamers to write tomes on the subjects in forums across the Web. Especially when confronted with the stuttering and unrealistic FPS numbers that just do not add up to the game play experience. In no way is this article going to be as comprehensive as I would like but it gives a good start to how and why we will begin incorporating these metrics into our upcoming video card reviews.
The ultimate goal here is to measure what the end user sees when playing their favorite games, not what is reported by a software tool; although FRAPS is not at fault for the results it provides as it reads the frames as they are delivered from the game engine. To find the actual number delivered to the display we need to use the hardware and software shown earlier. Part of the problem we have is that the GPU can easily deliver more frames than the monitor can display with a standard 60Hz refresh rate, giving you an imbalance in the rate at which the frames are delivered versus viewed. By measuring what is viewable versus what is measured as the frames leave the game engine we can get a clearer picture of where the issues lie. When frames are delivered but not seen or are on screen for times as short as 20 scan lines they may as well not be counted.
Let's revisit the overlay software part of the FCAT tools for a minute so that the explanation on runts and drops makes more sense. Overlay uses a series of 16 colors in a never changing order. By making the order static it is easy to identify when a frame is dropped as the sequence is then out of order. The order is White, Lime, Blue, Red, Teal, Navy, Green, Aqua, Maroon, Silver, Purple, Olive, Gray, Fuchsia, Yellow, and Orange. As you can see from the sequence of frames below the order runs and repeats when all the frames are sent to the display.
Runts & Drops:
Using the overlay tool included as part of the FCAT toolbox allows the rest of the software to identify each frame to complete the analysis. Since the color pattern is static there is an easy way for the FCAT tools to identify when you have a frame out of sequence or shown for a short period of time leading to both runt and dropped frames. So what is a runt frame and a dropped frame? The definition of a dropped frame is one that was rendered but not displayed on the screen. A runt frame is one that is displayed but is for such a short period of time that it is not seen on screen either. Runt frames are around 20 scan lines or less; at 1920x1080 that makes them small enough that they normally pass fast enough that they are not seen or perceived. Usually, as seen below, a runt is accompanied by some screen tearing. The example on the right has a series of five different frames displayed at one time: one fully formed frame, two partial frames, and a pair of runt frames.
By identifying these frames as ones the user will never experience you can pull them from the metrics measurements and show a more realistic accounting of what the end user will see while gaming. Especially when using dual GPU configurations. Let's take a look at some preliminary results to see what these new metrics will show us.
A tremendous amount of data is created during the capture and analysis process including charts that plot out the 95th and 99th percentile frame rates as well as charts that show the observed FPS versus the FRAPS FPS. Below are some representations of the data presented showing that yes, even with the latest drivers AMD is struggling with some frame time issues that drive the FPS delivered down to a level below that reported by FRAPS. Looking at the Frame Time chart you can see that when run in a CrossfireX configuration the frame time data is all over the board whereas NVIDIA's multi GPU solution delivers a consistent line across the chart. Slightly wider than the single GPU line but still much improved over the AMD configurations. Another look using the percentile chart illustrates this a little differently, showing that as the outliers are removed the FPS takes a downward swing. The results from one game do not tell the whole tale so looking at BF3 as well as Far Cry 3 provides a pair of results to illustrate the point with a little larger base of video cards.
Another look is created in the Run.Stat charts that show the difference between what is reported in the FCAT tools and what is counted by FRAPS. Looking at the NVIDIA solution first shows the blue and black lines match up fairly consistently with no runt or dropped frames evident due to how NVIDIA manages the frame rates. AMD, on the other hand, shows the effects of the frame time issues on the observed and reported FPS with a large section of dropped and runt frames evident that correspond to choppy game play. It's mostly a graphic representation of the problem that has been a concern for some time. Each game will show a different result based on how well the engine is optimized for a particular graphics solution. The Far Cry 3 results are telling on their own but moving to BF3 you can see some wild swings in the charts. Again mimicking the game experience.
Trolling through the data can be pretty daunting initially as there is just so much to work through. To put it in perspective and put it in an easier to understand format we can look at the true FPS output seen on the screen versus the average FPS we are used to seeing when using FRAPS. We have two numbers to look at: the Observed FPS, which is the FPS reported by FRAP,S and the New FPS that is reported by the FCAT tools.
As the charts show when you take out the runt and drop frames you get a significantly lower result based on the game tested and GPU combination used. The average FPS reported for the NVIDIA combinations results were almost identical when any runt and/or dropped frames were pulled from the averages whereas the AMD combination took a performance hit due to the wildly varying frame time latencies and FPS delivered. Single GPU results do not suffer this kind of symptom due to the reduction in frame time latency. When you look at the charts it's clear that NVIDIA has been taking the results of this technology and tuning its dual GPU options to get the smoothest game play.
So what does all this mean for OCC and the editors in general when it comes to video card performance evaluations? It means that we have a new way of capturing and aggregating the data for our performance metrics outside the confines of using traditional software tools like FRAPS. Not that there is anything wrong with it but as technology progresses there will always be a better mouse trap. Currently this one called FCAT is quite a bit more complex and time consuming to use but delivers data that paints a picture inherently more accurate than what we can get from FRAPS.
Is FCAT the be all end all solution for our benchmarking results going forward? Who knows at this point, but it is a great start that will allow editors to dig deeper than ever into how each frame is delivered. As time goes by and the process can become more automated we may see wider adoption by video card editors. By using software tools that are readily accessible and that can be modified or compiled from scratch, even more data can be extracted as the open source market readily proves over and again.
With only a week or so worth of play time under my belt it's tough to dig as deep as possible into the results without more time. Getting more familiar with the scripts and tools will in the end provide all the data we need to use FCAT tools more effectively. Frame times, runt and dropped frames, as well GPU configuration all have an impact on performance that can be measured with NVIDIA's FCAT tools.
This article was meant to be a high level look at the tools and results. A more in depth look at the technology and software can be found over at PcPer as they have been part of the beta and alpha test phase of these testing tools for over a year now and realy dig deep into the technology used. As a new way of looking at video card performance (specifically multi GPU solutions) the FCAT tools provide real tangible results as long as you take the time to dig through the data.