Asus HD 4870 – An Evolutionary Design

Overall Score

The HD4870 Lineage

The road to the HD4870 and to this Asus designed card has indeed been a long and eventful road for ATi and their parent company AMD. From the dismal disappointment and failure that was the hot running, underperforming HD2900 series, the comeback, although slightly mediocre release that we know as the HD3800 class, and now producing a card worthy of bearing Ruby and the ATi name, we arrive back at the R700 HD4800 series, and in particular the EAH4870. It’s not so much a revolution as a long evolution in design, a continued refinement that has seen steady improvement over the two years of the move from the venerated X1900 cards. It would be easy to say that AMD fixed the HD2900 to the HD4800 by simply reducing the process size, but it’s a bit more than that. To find out, we need to actually take a look at the granddaddy of the series, the card that was a failure at release, but had much potential to be improved, the HD2900.

The HD2900 and R600 – The Originator

To know where we are, we need to know where we came from, and the same is very true when looking at computer architectures, since most aren’t as much revolutionary designs as they are incremental evolutions from the previous version. The same rings true of ATi’s HD4870.

The R600 (Pele) is a bit of a special animal in the GPU world. It’s actually a second generation GPU, since it is in reality the second AMD/ATi to be built on a unified shader architecture, the first being the “Xenos” (C1)GPU that is found in the Xbox 360. Where previous architectures utilized separate processors to process each different graphics function, a unified architecture leverages many flexible processors (better known as stream processors, more on that later) which can be scheduled to process a variety of shader types, significantly increasing GPU throughput in theory.
What exactly is a stream processor? Stream processing allows some applications to more easily exploit a limited form of parallel processing. Stream processing simplifies parallel software and hardware by restricting the parallel computation that can be performed. Given a set of data or stream, a series of operations (kernel functions) are applied to each element in the stream, usually done as uniform streaming, where one operation is applied to all elements in the stream. The operations are usually pipelined, and the local on-chip memory is reused to minimize external memory bandwidth. Since these processes expose data dependencies, compiler tools are need to optimize on-chip management tasks. Stream processing hardware can use a technique score boarding, for example, to launch DMAs (Direct memory access) at runtime, when the dependencies become known. The elimination of manual DMA management can reduce software complexity, and the elimination of hardware caches reduces the amount of die area not dedicated to computational units such as ALUs.
The new unified shader functionality was based upon a Very Long Instruction Word (VLIW) architecture in which the core executes operations in parallel. The R600 used 64 superscalar (an architecture implementing a form of parallelism called Instruction-level parallelism within a single processor, thereby allowing faster CPU throughput than would otherwise be impossible at the same clock rate.) unified shader clusters, each consisting of 5 stream processing units for a total of 320 stream processing units. The RV610 and RV630 variants had some of the shaders removed, containing a total of 40 (5×8) and 120 (5×24) stream processors each. Each of the first 4 stream processing units were able to retire a finished single precision floating point MAD (or ADD or MUL) instruction per clock, dot product (dp), and integer ADD. The fifth unit was more complex and could additionally handle special transcendental functions such as sine and cosine. Each of the 64 shader clusters were able to execute 6 instructions per clock cycle (peak), consisting of 5 shading instructions plus 1 branch.
All this improvement didn’t come without issues, especially for the VLIW architecture. As noted above the nature of the R600 architecture lends itself to be highly parallel, which brings a host of issues namely that of maintaining optimal instruction flow. Additionally, the chip simply couldn’t co-issue instructions when one is dependent on the results of the other. The performance of the R600 GPU was highly dependent on the mixture of instructions being used by the application and how well the real-time compiler in the driver could organize the instructions. As anyone that has tried folding or programming for an ATi R600 series GPU will tell you, this makes the entire process harder than it would be for the competitor’s cards, especially with the release of NVIDIA’s CUDA platform. So in short, we got a card that was elegant, highly parallel, but lacked the raw power that seemed to be had in spades by their competition, NVIDIA.


The HD3800 and RV670 – Process Shrinked
To put it simply, by the November of 2007, ATi was getting their butt kicked. NVIDIA had the performance lead easily with the release of the G80 series graphics cards and this was made all the more depressing for the fact that ATi’s own next generation R600 GPU was losing to their previous R580 Rodin GPUs in gaming benchmarks. Something had to be done. The company was losing market share and rapidly losing mind share as well. The response from AMD? Shrink the process.
The main difference between the RV670 and the R600 when it all boiled down to the bone was the process. The RV670 was manufactured on a 55 nm fabrication process with a 256-bit memory controller, die size at 192 mm² with 666 million transistors; way down from the massive 420 mm² die size of the HD2900XT (R600), with the exact same 320 Stream Processing Units as the R600 core. While it seems like AMD didn’t really do much to improve the R6XX series, I neglected to mention a few key details to the story of the R600. To say the HD2900 ran hot would simply be an understatement. Overclockers and benchmarkers of the time remarked that the card had potential and could pull some serious weight, but it was almost a perquisite that the card would be run under water or other forms of high end cooling to get the most of the architecture. The shrink of the RV670 to the 55nm process not only reduced the heat and the massive power consumption that plagued the R600, but also allowed AMD to push up the clock speeds and try to stand toe to toe with NVIDIA, especially with the release of the G92-94 series of cards. The efforts paid off significantly when ATi took back the performance lead, albeit briefly, with the release of the HD3870X2 which placed two RV670 dies on one PCB. The card was able to achieve a peak single-precision floating point performance of just over 1 TFLOPS, at 1.06 TFLOPS, becoming the world’s first single-PCB graphics product breaking the 1 TFLOP mark. Now the stage was set for the HD4870 to shine.

The HD4800 and R700 – Back on Top

Now we arrive back at the HD4870, which we’ll begin the process of reviewing. Based upon the RV770 and released on June 25, 2008, the HD4800 series are the latest in the current evolution of DX10 cards from ATi. The RV770 extends the R600’s architecture by increasing the stream processing units to 800 units (up from 320 units in the R600 and RV670) that are grouped into 10 SIMD cores. The RV770 also has 40 texture units and 16 ROPs (raster operation units). The transistor count is also up from the 666 Million of th RV670 and is now a whooping 956 Million. To put this into perspective, the quad core variant of Nehalem only weighs in at 731 Million transistors. This puts the RV770 very close to the 1 Billion transistors that are on the GTX280. RV770 features a 256-bit memory controller that has a memory bandwidth of 115.2GB/s, up from the 57.6GB/s and 72.0GB/s of the previous HD3870 and is the first GPU to support GDDR5 memory, which runs at 900MHz giving an effective speed of 3600MHz. The internal ring bus from the R520 and R600/RV670 has been replaced by the combination of a crossbar and an internal hub.
The actual memory clock is much, much higher than what is listed in GPU-Z. more on that later.
With all this AMD again strikes a coup in the way they both market and price their cards. The cost of graphics cards hasn’t really dropped much in the past couple of months till AMD release an actually competitive card. Before the release of the HD4800 series, NVIDIA was pretty much content to basically charge what they wanted, since they owned all the market, from the low to the high. This is further reflected in the path they took when designing the GTX280 and it’s brethren. The GTX280 is huge, the transistor count is a bit over 1 Billion, the cost is astronomical and the power requirements are thru the roof, putting the card firmly in the realm of what is know as the uber enthusiast. AMD on the other hand, both out of necessity and a stroke of genius realized that most gamers can not afford to drop $500 for a new graphics card. We simply don’t have that kind of dosh laying around. So while NVIDIA was busy occupying a small space of the market with their $500+ cards, content to leave the previous generation cards for everyone else, ATi was taking over the midrange with the HD4850, and now taking back the high ground with the HD4870.
The response, to put it mildly, was deafening. NVIDIA has slashed prices on all it’s products to try and ease the damage, and ATi, of course, has responded in kind.

The particular HD4870 we have here is card based off the reference design and made by Asus. Asus, as you all well know is well known for it’s first class enthusiast and gamer boards as well as what I consider to be the best sound cards in the market, the Xonar series. Asus is also well known for their laptops such as the Eee PC and their barebook class, where the user can fully customize the laptop to however they want, as well for a variaty of mobile phones and PDA’s. Known for quality and performance, they are Tier One in every sense of the words. The EAH4870 512, as this card is known, runs at the reference clock rate of 750MHz and also the same reference memory clock of 900MHz (3.6GHz effective speed). It does however utilize a cooler different from the standard design, what Jesus calls, the Gladiator cooler. We’ll see what that does for the noise and the heat that the R700 has unfortunately become known for.



The Packaging and Contents

The actual box that the EAH4870 arrives in is quite large and contains all the marketing blurbs and information about the product, including what’s inside (as if you already didn’t know), the specs, the requirements and all the rest of that good stuff that will catch your eye as well as the near standard fantasy/sci-fi image on the box.
[timg]DSC00518.JPG[/timg] [timg]DSC00519.JPG[/timg]
Opening the Box yields the same finds.
As well as for the back of the box
The inside does come with a pretty snazzy interior box made from what appears to be corrugated cardboard, which is actually a box for another set of boxes containing the manual and driver CD, the accessories like the Crossfire bridge, component cables, DVI-HDMI and DVI-VGA adapters as well as an S-Video adapter and of course, the EAH4870 herself. It’s nice to see that even in small things, Asus isn’t skimping out. The boxes are nice quality and everything has a bespoke feel to it, even though it is mass market.
[timg]DSC00524.JPG[/timg] [timg]DSC00525.JPG[/timg]
A well put together box. Nice touch.
[timg]DSC00526.JPG[/timg] [timg]DSC00528.JPG[/timg]
They even include a pretty sweet mouse pad.
How glorious she looks…


The EAH4870 In Depth

In the first shot we can get a clear view of the differences between the reference HD4870 and the previous generation X1950XTX which is similar in size. This is a definite plus since it shows that the size of the card is going to be in the normal range, even for those with mATX cases, although a full ATX case would be advisable in terms of airflow.
The second shot takes a look at the reference cooler. Take a good look. There’s key differences between it and Asus’s Gladiator.
[timg]IMG_5339.jpg[/timg] [timg]IMG_5366.jpg[/timg]
Images Courtesy Jeff Clark Photography
[timg]DSC00531.JPG[/timg] [timg]DSC00534.JPG[/timg]
Notice the larger size for the fan on the Asus Model
[timg]DSC00535.JPG[/timg] [timg]DSC00536.JPG[/timg]
The Archer and her backside
[timg]DSC00540.JPG[/timg] [timg]DSC00541.JPG[/timg]
At a side



The Included Extras

Asus also includes some nice programs such as its Smart Doctor utility which allows on the fly adjustment of clockspeed and memory speed as well as fan management, which is important with the quite loud HD4870. Also bundled is the Gamer OSD which allows the user to take screenshots as well as screen captures in .jpeg and .mpeg formats respectively. Catalyst and ATi’s CCC are aslo included on the software disc. In truth it’s not much, but it’s a definete plus to see that they at least take the time to include it.


[img]Screen01.JPG[/img] [img]Screen02.JPG[/img] [img]Screen03.JPG[/img]


Please note that programs like SmartDoctor, GPU-Z and the Catalyst Control Center will read the memory speed as 900Mhz due to the fact that GDDR5 operates different from all other DDR memory types. Basically, there are two “banks” of memory each operating at 900Mhz SDR (1800Mhz DDR) and these programs will only read one of these banks. The two banks combine for a total of 3600Mhz DDR memory speed.


In the Case

[timg]DSC00543.JPG[/timg] [timg]DSC00544.JPG[/timg] [timg]DSC00545.JPG[/timg] [timg]DSC00546.JPG[/timg]


A Closer Look At the Gladiator Cooler

[timg]DSC00547.JPG[/timg] [timg]DSC00548.JPG[/timg] [timg]DSC00549.JPG[/timg] [timg]DSC00550.JPG[/timg]

[timg]DSC00551.JPG[/timg] [timg]DSC00552.JPG[/timg] [timg]DSC00553.JPG[/timg]






We now move on to the actual tests where we can see how the EAH4870 stacks up against the competition, but first, a brief notation of procedure and the test bed setup.
Temperatures were recorded via the Asus Smart Doctor software and correlated with RivaTuner’s hardware monitor as well as Catalyst Control Center and the average of the three being the result posted.
Overclocking was done via the bundled Asus Smart Doctor utility as was fan control.
All synthetic benches are at their standard setting for each with no changes being made to any settings excepting resolution changes.
Games were tested at 4XAA/AF for every card at all resolutions with the VSync turned off.
Decibels were recorded with an Extech 407732 Type 2 35 Decibel Meter with the distance a constant 1M from the target zone. The decibels were measured with the case door on since this would be the environment that the card would typically be run in. This also applies to the way the temperatures were recorded.
Decibel (dB) is a logarithmic unit of measurement that expresses the magnitude of a physical quantity (usually power or intensity) relative to a specified or implied reference level in this case, relative to 0dB. Since it expresses a ratio of two quantities with the same unit, it is a dimensionless unit. A decibel is one tenth of a bel (B). In our case, we are using decibel to measure the sound pressure, that is, is the local pressure deviation from the ambient (average, or equilibrium) pressure caused by a sound wave. Because decibels are logarithmic units, their scale is not linear and intensity of 2dB-A @1M relative to 0dB is not +1dB-A @1M in intensity relation to 1dB-A @1M.
Since it isn’t easy to visualize a logarithmic function, let’s scale it.
Image Courtesy Hong Kong Environmental Protection Department


The Testing

Test Bed


Processor – Intel Core 2 Quad Q6600 B3 @3.2GHz
Cooler –Thermalright Ultra 120 Extreme
Motherboard – DFI DK-X38-T2R
Memory – Mushkin Enhanced DDR2-800 4GB @4-4-4-12
Power Supply Coolmax -700W Modular
Hard Drive – WD Raptor 74GB SATA
Case – Silverstone TJ06
OS – Windows Vista Ultimate x64 SP1
Ambient Temperature – 65°F (18.3°C)
Thermal Interface – Shin-Estu
Ambient Noise- 36db-A
Display Driver Version – Catalyst v8.7



Synthetic Tests

[img]3DMark06 SM2.0.png[/img]

[img]3DMark06 HDR-SM3.0.png[/img]

[img]3DMark Vantage.png[/img]

[img]PCMark05 Graphics Suite.png[/img]

[img]OpenGL Bench.png[/img]



Race Driver: GRID is the latest addition to the TOCA Touring Car series by Codemasters, published and developed by the same company. Race Driver: GRID uses Codemasters’ own Ego engine, an evolved version of the Neon engine already used in their previous release Colin McRae DIRT. The damage code has been completely rewritten to allow for persistent damage environments. The game has a native resolution of 720p and has excellent gameplay as well as a good load on the graphics card.
Click to Enlarge
[img]GRID FPS.png[/img]

Crysis is the latest FPS from German developer Crytek, known for their previous game Far Cry. Like it’s predecessor, Crysis is also full of sprawling non linear gameworlds, intense combat and also continues the tradition of being graphically intense. This game load all cards and is a good measure of stress to a video card. It does tend to favor NVIDIA cards.


Click to Enlarge

[img]Crysis Average.png[/img]


Call of Duty 4: Modern Warfare is a first-person shooter video game developed by Infinity Ward and published by Activision. Call of Duty 4: Modern Warfare runs on a proprietary engine and with features that include true world-dynamic lighting, HDR lighting effects, dynamic shadows, and depth of field. “Bullet Penetration” is calculated by the engine, taking into account factors such as surface type and entity thickness.

Click to Enlarge

Call of Juarez is a Western-themed first-person shooter from the Polish developer Techland. The North American release of the PC version is one of the first games to utilize Microsoft’s DirectX 10.

Click to enlarge
[img]Call of Juarez.png[/img]

Lost Planet: Extreme Condition is a third-person shooter video game created by Capcom for the Xbox 360 and Microsoft Windows. The Windows version includes DirectX 10 support, when run in Windows Vista, and enhanced graphics. The game comes packaged with it’s own built in benchmark

Click to enlarge
[img]LP DX10.png[/img]

Unreal Tournament 3 is actually the fourth game in the Unreal Tournament series and the eighth Unreal game, but it has been numbered in terms of the engine it runs on. This game also tends to favor NVIDIA GPUs.

Click to enlarge
[img]Unreal T3.png[/img]

[img]Fan Speed To Decibel Correlation.png[/img]

[img]Temperature to Fan Speed Correlation.png[/img]

[img]Temperature to Fan Speed Correlation Load.png[/img]


After three generations of being an also ran, AMD finally releases a competitive card. Looking at the results for the EAH4870, even where the card loses to it’s rival, the GTX280, it comes so close that the results are inperceptable, especially when you factor in that you get much better performance per dollar when compared to the NVIDIA offerings. Factor in the ability to use Crossfire on any Intel chipset with 2x 16x PCIe lanes and the deal gets even sweeter. This time AMD hasn’t had to settle for second best. They took the fight to NVIDIA.

The Asus branded card’s cooler did actually work as performed and worked 7C cooler than the reference designed ATi card. The card was able to run relatively quietly, and still run cool. Let’s hope Asus delievers on the Matrix cooler and puts out a special edition of this card with it. Even still, I highly recommend the EAH4870 for gamers and enthusiasts.


Discuss in the Forums


  • Facebook
  • Twitter
  • Myspace
  • Google Buzz
  • Reddit
  • Stumnleupon
  • Delicious
  • Digg
  • Technorati

Leave A Response