Welcome Stranger to OCC!Login | Register

Fermi: Codename GF 100 Closer Look

ccokeman    -   January 18, 2010
Category: Video Cards
» Discuss this article (45)

Introduction:

While ATI has been enthusiastically touting the features and performance of its 5 series lineup, NVidia has sadly been left with nothing to truly compete with the HD 5870 and HD 5970 and in the process lost the single and multiple GPU performance titles. ATI was the first to deliver a complete DX 11 line up while there was not a DX 11 video card coming forth from the green team stables that could compete with the behemoths from ATI. Nvidia has not been idle over the past months and are now ready to let the curtain down and show off what they have been working on for the past year. The public was treated to a sneak peak of the GF 100's specifications during the GPU Technology Conference NVidia held back in September of 2009. This conference was geared towards the HTPC crowd but make no bones about it the Fermi architecture is designed first and foremost as a way back to the top of the gaming performance ladder. It just so happens that the compute side is designed in as well to take advantage of next generation of gaming effects including ray tracing, fluid simulations and order-independent transparency. All the work behind the scenes is finally ready to bear fruit with the introduction of the GF 100. GF 100 is an internal designation for a Graphics card built using the Fermi architecture with 100 as the series number, internally of course. Overclockersclub was invited to a briefing on the Fermi architecture and the tools to take advantage of it, and was shown some demos on how the performance relates to Nvidias current generation video cards as well as Nvidia 3D Vison Surround. Lets take a look at Fermi and what it brings to the table.

 

Closer Look:

NVIDIA's primary goals for the GF100 include top of the line gaming performance, excellent video quality, "film like geometric realism" and developing a "compute architecture" for gaming. To increase performance over the previous generation the GF100 is equipped with double the amount of CUDA cores and double the amount of ROP'per ROP partition. First off lets take a look at the GF 100 block diagram and list the actual specifications, much of which has already been discussed before but I will list them all again. The GF 100 is equipped with 4 GPC(Graphics Processing Clusters) that house 512 CUDA cores, 16 Geometry units, 4 Raster units, 48 ROP units, 64 texture units and a 384bit GDDR5 bus. There is a total of 6 memory controllers and 768KB of unified L2 cache. Each one of the GPC's is in essence its own entity so as it appears the GF 100 is a Quad Core GPU. Surely the architecture can be scaled down to fell lower price points.

 

Each of the 16 SM or Shader Multiprocessors is equipped with a total 32 CUDA cores which is 4 times the amount on the GT 200 core, 4 Texture units, 64KB of on chip configurable memory that can be configured as 16KB of L1 cache and 48kb of shared memory or 16kb of shared memory and 48kb of L1 cache, 4 special function units,16 load/store units and a single Polymorph engine. Each of the 32 CUDA cores has both a fully pipelined integer arithmetic logic unit and floating Point unit. IEEE 754-2008, the new Floating point standard is used to add the FMA (Fuse Multiply Add) instruction set for both single and double precision arithmetic. A better solution than MAD(Multiply Add). Each of the Polymorph engines is equipped with five different stages Vertex Fetch, Tessellator, Viewport Transform, Attribute Setup and Stream Output. Once all the work passes through the Polymorph engine it is sent to the Raster Engine to send the data to the display with the right timing.

 

 

Each of the SM in the GPC cluster have a dedicated 64kb of on-chip memory shared configurable as 48 KB of Shared memory with 16 KB of L1 cache for graphics use and, as 16 KB of Shared memory with 48 KB of L1 cache for compute use. For Compute programs the shared memory and L1 cache work to reduce off chip traffic and is an ideal solution for high performance CUDA apps. The 768kb of unified L2 cache is read/write enabled so the GF 100 can support C/C++ programming. The comparison chart shows just how the memory architecture is different from that of the GT200.

 




  1. Introduction & Closer Look
  2. Closer Look: Continued
  3. Closer Look: 3D Vision Surround, Conclusion
Random Pic
© 2001-2014 Overclockers Club ® Privacy Policy

Also part of our network: TalkAndroid, Android Forum, iPhone Informer, Neoseeker, and Used Audio Classifieds

Elapsed: 0.0261080265