Fermi: Codename GF 100 Closer Look

ccokeman - 2010-01-14 20:57:36 in Video Cards
Category: Video Cards
Reviewed by: ccokeman   
Reviewed on: January 18, 2010


While ATI has been enthusiastically touting the features and performance of its 5 series lineup, NVidia has sadly been left with nothing to truly compete with the HD 5870 and HD 5970 and in the process lost the single and multiple GPU performance titles. ATI was the first to deliver a complete DX 11 line up while there was not a DX 11 video card coming forth from the green team stables that could compete with the behemoths from ATI. Nvidia has not been idle over the past months and are now ready to let the curtain down and show off what they have been working on for the past year. The public was treated to a sneak peak of the GF 100's specifications during the GPU Technology Conference NVidia held back in September of 2009. This conference was geared towards the HTPC crowd but make no bones about it the Fermi architecture is designed first and foremost as a way back to the top of the gaming performance ladder. It just so happens that the compute side is designed in as well to take advantage of next generation of gaming effects including ray tracing, fluid simulations and order-independent transparency. All the work behind the scenes is finally ready to bear fruit with the introduction of the GF 100. GF 100 is an internal designation for a Graphics card built using the Fermi architecture with 100 as the series number, internally of course. Overclockersclub was invited to a briefing on the Fermi architecture and the tools to take advantage of it, and was shown some demos on how the performance relates to Nvidias current generation video cards as well as Nvidia 3D Vison Surround. Lets take a look at Fermi and what it brings to the table.


Closer Look:

NVIDIA's primary goals for the GF100 include top of the line gaming performance, excellent video quality, "film like geometric realism" and developing a "compute architecture" for gaming. To increase performance over the previous generation the GF100 is equipped with double the amount of CUDA cores and double the amount of ROP'per ROP partition. First off lets take a look at the GF 100 block diagram and list the actual specifications, much of which has already been discussed before but I will list them all again. The GF 100 is equipped with 4 GPC(Graphics Processing Clusters) that house 512 CUDA cores, 16 Geometry units, 4 Raster units, 48 ROP units, 64 texture units and a 384bit GDDR5 bus. There is a total of 6 memory controllers and 768KB of unified L2 cache. Each one of the GPC's is in essence its own entity so as it appears the GF 100 is a Quad Core GPU. Surely the architecture can be scaled down to fell lower price points.


Each of the 16 SM or Shader Multiprocessors is equipped with a total 32 CUDA cores which is 4 times the amount on the GT 200 core, 4 Texture units, 64KB of on chip configurable memory that can be configured as 16KB of L1 cache and 48kb of shared memory or 16kb of shared memory and 48kb of L1 cache, 4 special function units,16 load/store units and a single Polymorph engine. Each of the 32 CUDA cores has both a fully pipelined integer arithmetic logic unit and floating Point unit. IEEE 754-2008, the new Floating point standard is used to add the FMA (Fuse Multiply Add) instruction set for both single and double precision arithmetic. A better solution than MAD(Multiply Add). Each of the Polymorph engines is equipped with five different stages Vertex Fetch, Tessellator, Viewport Transform, Attribute Setup and Stream Output. Once all the work passes through the Polymorph engine it is sent to the Raster Engine to send the data to the display with the right timing.



Each of the SM in the GPC cluster have a dedicated 64kb of on-chip memory shared configurable as 48 KB of Shared memory with 16 KB of L1 cache for graphics use and, as 16 KB of Shared memory with 48 KB of L1 cache for compute use. For Compute programs the shared memory and L1 cache work to reduce off chip traffic and is an ideal solution for high performance CUDA apps. The 768kb of unified L2 cache is read/write enabled so the GF 100 can support C/C++ programming. The comparison chart shows just how the memory architecture is different from that of the GT200.


Closer Look:

The bottom line for NVIDIA here is to improve gaming performance, visual quality and to move towards Geometric realism. Hardware accelerated tessellation is one of the capabilities that the GF100 and other DX 11 cards will use to improve on the realism of the finished frames. Tessellation is in basic terms taking the larger polygons and breaking them up into much smaller triangles hence you have a more detailed image. Looking at this first slide you can see how the gun holster and right hand shoulder are not quite round, nor is the corrugated roof wavy. One thing you notice is that the model is wearing a hat since creating realistic hair is resource intensive. The shading of the model is improved but the Geometric realism is quite modest while in the movies you get the best of both worlds. The key is to get there for gaming. Something GF 100 is designed for.









With the GF 100 Nvidia has redesigned the ROP subsystem to improve efficiency and throughput. With this subsystem improvement they can implement a 32X Coverage Sampling Anti Aliasing (CSAA) to help increase the level of (perceived)geometric realism. This system is actually 8 color samples and 24 coverage samples for a total of 32 samples. In this picture from Age of Conan you can see the difference between using TMAA with 16xQ anti aliasing (8multi samples+8 Coverage samples) on GT 200 vs the GF 100 TMAA 32x anti aliasing (8 Color +24 Coverage samples).



We were shown a few demos to show off the abilities of the GF 100. The Water demo took a basic map and applied tessellation and displacement mapping to make the scene come alive. While the pictures are far from doing the image justice the water looked real and the scene was rendered well over 100 fps. There did not appear to be any repeating waves during the minute or so I watched the demo. If you look at the photos the difference is stunning.



The second demo we had was the "Hair" demo. How many games have you played where there was not a hat or helmet on the models or that the hair moved as in blocks or as a whole? This demo showed what realistic hair can look like when rendered. Each strand of hair seemed to be rendered independently of each other. You could animate the head by twisting turning and applying force to the model(wind) to see how the hair behaved. The hair actually looked realistic and moved as hair should move.



We were treated to a Ray Tracing demo where the work was being done on the GF100 GPU with a comparison system having a GTX 285 in it. The difference in performance between the two at 2560x1600 was about a three fold difference. Even though the frame rates were below one with the maximum detail there was a significant differences in processing power. When the resolution was decreased the performance scaled quite well. If you look at these images the first and maybe second glance will tell you that they look absolutely real but are in fact rendered images using ray tracing. This feature can be used in driving games in a showcase mode where you can show off the object of your endeavors.



NVIDIA has put together a demo they call Supersonic Sled that illustrated all of the features of the GF 100. The premise was to build a demo to that would be fun yet functional. You get GPU Physx effects for the Fluids, Smoke, Dust and Pilots joints. You get particle simulation for the Rocket dust, Fireballs and smoke trails. Tessellation for the terrain and Image processing for the motion blur. The object is to go as fast as you can without blowing up the sled. This simulation requires an immense amount of computing power and was run on a 3-Way SLI setup for maximum effect. In some of the shots you will notice the sled and driver are perfectly clear but the surrounding area is blurred giving the illusion of speed.




One of the last demos we were shown was an upcoming 3rd person shooter from Capcom called Dark Void that makes use of much of the Physx toolbox. You have the Disintegrator and Turbulence effects from the Jet packs that look absolutely great in game. The Jet pack uses up to 100,000 particles to create a wispy looking smoke. Look for more on this game in the coming weeks as I have a copy to put to the test.



Closer Look:

Everyone has heard of Nvidias Stereoscopic 3D Vision system by now. If you haven't you must have been living under a rock. With the introduction of the GF 100 and its driver set you will be able to move up to a 3 panel system Nvidia has named 3D Vision Surround. This system requires the use of two or more GF 100 cards in an SLI configuration to push the 746 million pixels being displayed on 3 identical 3D Vision Surround capable monitors at a resolution of 1920x1080. This means at least a 120hz refresh rate. When you add in tessellation , ray tracing, Physx and compute shaders you will need the horsepower two of these cards offer to get the best experience. If you do not use the the 3D Vision system you get Nvidia Surround that is supported with resolutions of 2560x1600 x 3 with the only caveat being that the monitors must all share the same resolution. Nvidia offers one other feature in Bezel correction. This allows part of the game to be hidden behind the monitor bezel to make it seem as though the bezel is part of the game. Much like looking out the window of a car or the cockpit view in a flight sim. The demonstration looked great and offered that additional level of visual quality and immersion. Unfortunately I was so engrossed in this that I forgot to snap a few pictures to show it off so you will just have to settle for the press pictures.







From everything discussed during the editors day with Nvidia it looks as though Nvidia has put together a package that will rip the socks off the red team. Although we were shown some performance numbers with a few Far Cry 2 benchmark runs where the GF 100 soundly trounced the GTX 285 comparison system with the minimum observed frame rates on the GF100 were roughly (65.64) and the GTX 285 scored (38.52). There was no mention of what the final clock speeds will be nor would NVIDIA budge on a time frame other than later than today. All of the performance demos were designed to show what the GF 100 has to offer. The Ray Tracing demo showed a 3 fold increase in performance over the GTX 285, the Tessellation demos showed just how close to realistic graphics they are getting, and the Supersonic Sled demo was a blast and showed off just how good the physics and visual effects really are becoming. NVIDIA has put together a suite of tools for the developer with Nexus and for the Artist there is APEX. Watching the demonstrations of the capabilities of these tools was an eye opener. While meeting with NVIDIA, they put forth a presentation to show just how much work is done on the back end during game development and how much they actually do from helping with coding to supplying their competitors hardware to game developers so there is that compatibility; a pretty interesting presentation to say the least. Last but not least there was the 3D Surround demonstration that brought 3D Vision to another level. All in all a very informative day. I cannot wait to see if the demonstration benchmarks are where the performance numbers will actually fall or if this was just the tip of the iceberg with another bump up with the final revision. The winds are a changing!