# Serious Statistics Review

**Guest_Jim_* -**

» Discuss this article (1)

## Testing Methods:

With access to every map, I decided I wanted to pick from the beginning, middle, and end to run a course on three times each to collect 300 seconds (5 minutes) of data on. The maps I picked were Hatsheput, Dunes, and Thebes – Karnak and I captured videos to show the courses I ran. (The Hatsheput video is short of the 300 seconds as I stopped recording after completing the map instead of running around.)

Hatsheput is the first map of the game, and coincidentally takes about five minutes to complete (unless you go for a specific secret, which I decided not to do). Unfortunately this always means it is possible to complete the map before the timer runs out, and when this happened I just ran around the final area, so the screen had to keep updating. There are not very many enemies on this map and you spend a lot of time on or inside of a structure.

Dunes is the seventh map of the game and the first 300 seconds, unlike Hatsheput, is completely outside and has a lot of enemies. While I did learn where and when enemies would spawn in, the nature of AI-controlled enemies means no two runs will ever be identical.

Thebes – Karnak is the game’s twelfth map and provided a fairly clear course to run with a mix of environments and combat. Sometimes you are within cramped structures and sometimes in large, walled-in yards. There is also a healthy number of enemies, so the experience is somewhere between Hatsheput and Dunes.

For each test I used the Ultra preset. Curiously, this does not put every option at its highest setting, but to keep things simple and uniform, I kept to the preset. Vertical sync was off, but it did come up in my testing, and I did some additional tests with it that are at the end of the article. I also played using the Borderless (Window/Fullscreen) option, which is my general preference when playing a game. I like being able to Alt-Tab safely, and with how many applications were hooking into the game being able to Alt-Tab out of a crash can be the difference between needing to restart the computer or not. This decision came back to haunt me just a little, but I will explain all of that later.

Something I should mention is that OpenGL exhibited a strange issue on Dunes that neither Vulkan nor DirectX 11 suffered. For some reason the lighting seemed to be at least partially disabled under OpenGL, so its data may not be completely accurate on this map.

## Data Processing:

Before getting to the data itself, I want to cover what I pulled out of it in a common spot, so that it is easy to find.

Even though many people seem rather disinterested in it today, I will provide average frame times and framerates. This framerate will not be directly calculated from the recorded frame time. Instead, I am taking advantage of the fact that every entry in the CSV is a completed frame, so I divided the length of the dataset by the maximum time in seconds. This difference is therefore the actual average frames per second over the 300 seconds of the test. (Curiously this tended to differ slightly from the average frame time, but only be a small, likely negligible amount.)

A lot of people seem interested in percentiles, so I have those as well at 0.1%, 1%, 99%, and 99.9%. Percentiles relate to how much of the data is on either side of a value, so the 99^{th} percentile means 99% of the data is below that value, with 1% above it. Though I do not use it, the median is the 50^{th} percentile, with one half of the data above and the other below. One reason to consider percentiles is because they can naturally exclude outliers that may throw off other characteristics, such as the average.

As I mentioned when discussing R, I also have the ability to find what the corresponding percentile is for a givec value. Those values I considered are for 60 FPS, 50 FPS, 30 FPS, 20 FPS, and 15 FPS. I feel these values give a nice picture of how a game was performing and what framerate regions it tended to stick to. Spoiler, the game performed very well so these turned out to be fractions of a percent, but for other games, these points may prove more useful.

The last piece of data I specifically pulled out to share are the percentiles of the differences of the frame time data. These data tell us how consistent the frame time is from one frame to the next, with small differences suggesting trends in the frame times were smooth. Large differences indicate judder and I believe stutter as well, because the next frame must be much older than the last, and not representing the most recent input.

In addition to the pieces of data, I also produced a number of graphs. The first graphs for each section are the course graphs which just show the frame times over the length of the course I ran. The blue line is a smooth regression of the data using a generalized additive model that I am not versed well enough in statistics to explain very well. I added it because it is easy enough to do so and I like having something there to show the trend of the data. The red lines on these graphs mark the 0.1%, 1%, 99%, and 99.9% marks. (They are green in the frame time overlay video to minimize potential chroma-subsampling issues.)

The next graphs are QQ distribution plots for each run. These show the same information as the course graphs, but the data is sorted and the x-axis relates to percentiles instead of time. The rectangles I added are to mark the 0.1%, 1%, 99%, and 99.9% values on the data points and the y-axis, which is still frame time.

The remaining graphs are all frequency plots of frame time, framerate, and display time. These tell you how much of the data is at any specific value, such as 16.667 ms, or 60 FPS. The display time graph would normally have milliseconds as the units on the x-axis, but I decided to change it to be 16.667 ms, which corresponds to how many frames later an image was sent to my display. One later means it was sent for the next frame while exactly 0 means the frame was dropped, as OCAT/PresentMon considers MsBetweenDisplayChange to be 0 when a frame is dropped. Frames being sent between two ticks can potentially tear.

Now I think we can get to the actual data.