Phenom II 940 Review: Clock for clock Deneb vs. Yorkfield

Overall Score

Phenom II, by JumpingJack

Unless you have been on Mars or trapped on a deserted island for the past six months you are well aware of a global recession and everyone is hurting. As gruesome as the current economic climate appears, the situation has not stopped our favorite high tech companies from generating new products and introducing higher performance parts in a calculated move to squeeze your already squeezed pocketbook. Today we shall take a look at one of the most recent, and anticipated, products — AMD’s sequel to their original quad core, the Phenom II. While Intel uses code names related to geographical locations, AMD’s code names generally follow a theme such as cities or stars, such as with Deneb — the brightest star in the constellation Cygnus.

Similarly, unless you have been living in that constellation for the past year or so, or have been a hermit living in the caves of Saskatchewan you will also know that AMD’s first attempt at a native quad core design fell well short of expectations, struggling to compete with their competitors dual dual core – quad solution. The recent introduction of Phenom II looks to change that with an opportunity to introduce some architectural tweaks as well as a new process technology at the 45 nm lithographic node.

The novelties that AMD brings with their new processors, originally released as Shanghai for servers, is larger L3 cache, some tweaks to the core architecture, and a 45 nm technology comprised of two key features — ultra low-k dielectrics and immersion lithography technology. We will discuss more about the 45 nm portion in a moment.

[img]AMD-slide1.jpg[/img]

Taken together, the instruction per clock (IPC) gains and the clock speed improvements, AMD is touting a 20% performance improvement gain over the top bin 65 nm Phenom 9950. AMD has launched Phenom II in two speed bins, one at 2.8 GHz and their multiplier-unlocked black edition at 3.0 GHz. Both clock speeds are a significant improvement over the 65 nm counterpart, and in the months/weeks leading up to launch AMD was most certainly eager to demonstrate the vastly improved clock headroom their new 45 nm process affords. Using extreme cooling measures such as liquid nitrogen and phase change technology, AMD has demonstrated parts functioning in excess of 6GHz. Before we launch right into benchmarks and performance, let’s take a closer look at what AMD has put into this new CPU.

AMD has not been as vocal about the tweaks they made to the architecture as they were concerning the original native quad core design. Since Phenom II is launching on a new process technology, a little over a year after the original Phenom, the architectural changes are minor, as is usually the case. Phenom II is based on, and is very much the same, architecturally, as the Phenom I.

 

[img]AMD-slide2.jpg[/img]

 

The first, most obvious and, as a consequence of smaller transistors — the most natural step, is to increase the cache size. In the past, AMD’s cache size has usually been smaller than Intel’s approach, afforded to them by the performance gains granted by having the memory controller on-die and close to the cores. However, stepping out of form, AMD has taken the die shrink advantage of their 45 nm process and made a huge leap in cache. AMD has increased the L3 cache from 2 MB in Phenom I to 6 MB in Phenom II. As is the general rule of thumb, larger caches always boost performance compared to smaller caches with a simple caveat — so long as the latency is not increased. However, the corolary to this rule is the following: latency usually increases with cache size, or rather more aptly as total cache area on the die grows. By moving to the 45nm lithography node (more on this later), AMD is able to maintain or slightly reduce the total die size, and thereby keep latency relatively constant. AMD also implemented additional measures to ensure that latency did not grow with the increase in cache size which is where more of the process portion comes into play (more on this later).

 

The less obvious, and less advertised differences in Phenom II include improvements to branch predictors, load/store buffering, and floating point register-to-register moves. The latter will not be a significant impact, rather more targeted to certain applications — such as virtualization and floating point heavy throughput applications (i.e. tweaks more intended for the applications utilized with the server variant). However, the branch prediction improvements are a very significant step forward, and one that AMD needed as the Phenom I branch prediction was certainly weaker than the Intel branch prediction implemented in both Core 2 Duo/Quad and Core i7.

 

Since the implementation of the super scalar pipelined microprocessor, designers are always challenged to keep the performance high by keeping the execution pipeline full. Professional software engineers will tell you that simple branching from one segment of code to another is a fundamental necessity of programming. Boiled down to the essentials, branching is the ability to make a comparison and based on the logic jump from one segment of execution to another. That is, when faced with a decision to follow one code path or another based on a true of false outcome there arises the possibility of a branch.

 

When a branch instruction is encountered in the instruction stream the branch predictor logic attempts to predict which direction will be taken before the branch instruction actually executes, thereby fetching subsequent instructions along that branch. If the branch is predicted correctly everyone is happy, the natural next segment of code is already through the 10-15 stages of decoding, reordering, and alignments such that it can be immediately scheduled for execution. The problem comes when the branch was predicted incorrectly. In this case, the group of instructions that were behind the branch instruction are no longer valid, hence the CPU must stop what it is doing (i.e. the execution stalls) while the front end logic flushes the pipeline and fetches the correct block of correct code. This takes time and wastes cycles. In fact, the penalty for a mispredicted branch is enormous, and one reason why the Intel Netburst architecture, with it’s very long pipeline, suffered so tremendously in performance as it could take up to 30 dead cycles to repopulate the pipeline. Even the CPU in the Xbox360 must do branch prediction, and mispredicted branches can cost as many as 20-24 wasted CPU cycles.

 

There are several different methods for branch prediction and there are a few different kind of branches to be predicted. We will not go into detail here, other than to say that many different types of applications, by nature of their code, have many points where branching occurs, as such good branch prediction is a must to improve performance. Games, which are often the driving force behind the purchase of high-end computers, fall into such a class. Those interested in just how much branch prediction occurs in games can read this paper.

Thus, with improved branch prediction one would expect Phenom II to show some improvement overall in gaming code outside of those expected gains from cache.

 

 

AMD and IBM have been co-developing process technology in a symbiotic relationship for many years now. The two main features advertised in the 45 nm process technology that AMD now utilizes was the implementation of immersion lithography and ultra-low k dielectrics. Let’s look at the immersion piece of the process first.

As transistors have been made smaller and smaller in accordance with Moore’s Law, the development of the feature size necessary to maintain the performance curve that Moore’s Law describes has become more challenging. One underlying problem lies within the physics of being able to focus light and resolve features at the nanometer level. If the feature size you are trying to print as a pattern becomes smaller than the wavelength of light used to do the exposure then physics tends to resist such a situation and the light begins to diffract to a point where the feature is not resolvable. This is a lot to comprehend, but to think about it conceptually you would find it difficult, and most likely impossible, to use a 1″ wrench to tighten a 1/8″ nut.

There are a few ways to work around the diffraction limitations imposed by physics and, actually, utilize the phenomena to one’s advantage. Phase shifting is one such method to use the diffraction properties of light to focus to smaller dimensions. Double exposure techniques (what Intel uses for their 45 nm process) have also been shown to enable smaller printable features. AMD, however, chose to implement immersion lithography which works around this problem by simply changing the refractive index of the medium light passes through between the lens and the wafer. In choosing liquids of the right refractive index (water in this case) AMD is able to literally ‘over’ bend the light, something that would be impossible if simply passing the beam through air.

[img]Immersion_lithography.png[/img]

In immersion lithography, the lens that focus and exposes the wafer to make the pattern is literally immersed in a droplet of liquid carefully selected to produce the right difference of index of refraction to that of the lens and thereby enable it to focus correctly. Utilizing these physical characteristics allows AMD to print the features necessary for the 45 nm node in a novel and unique way which produces the necessary transistors.

The second, and arguably more important, feature is the ultra-low k film. After transistors are printed in the lithography process they are wired together using metal lines supported by an oxide infrastructure. An inherent problem with this process is that as devices are made smaller, the relative distance of the metal lines that make up the wires get much closer together. As a result, parasitic capacitance between two metal lines grow larger which limits the ability to clock the device as well as increases the power required. Utilizing materials with lower ‘k’ or dielectric constants counteracts the shrinking component and effectively improves the situation for the parasitic capacitance.

[img]AMD_slide3_ulk.jpg[/img]

It is most likely, though not conclusively, a combination of good circuit design coupled with the implementation of much lower dielectric materials that contributes to both the excellent power performance of the Phenom II as well as the ability to reach higher clock speeds than the 65 nm counterpart. When combined with the larger cache and core tweaks, AMD has produced a much better CPU compared to Phenom I — higher IPC, higher top clock speed, and better overclocking potential. This in turn closes the gap significantly compared to their arch nemesis and restores competitive balance between AMD and Intel, something every enthusiast should applaud.

So, with those explanations done and a more clear picture of what AMD did for Phenom II, let’s move on toward the performance question.

 

The goal of this review is to look at CPU intensive benchmarks, comparing these two architectures. We would have liked to do some gaming benches but motherboard and graphics combinations would have been too difficult to make fair. All tests were run three times with the median score selected as the representative. If an outlier (more than 3% variation from the statistical average) was encountered, a second set of three benches was completed. This only occured once on each setup.

 

AMD Setup:

  • AMD Phenom II 940
  • 4GB G. Skill DDR2-1066 5-5-5-15 RAM
  • Samsung 500GB Spinpoint T-Series Hard Drive
  • DFI LanParty Jr. 790GX-M2RS Motherboard
  • Sapphire HD 4870 Graphics Card
  • In-Win Commander 1200w PSU
  • Windows 7 Build 7000 Beta x64

Intel Setup:

  • Intel QX9770 (Emulating a Q9550 with multiplier set at 8.5 and FSB adjusted accordingly to match the AMD Setup’s clockspeed)
  • 4GB G. Skill DDR2-1066 5-5-5-15 RAM
  • Samsung 500GB Spinpoint T-Series Hard Drive
  • Asus Maximus Formula II P45 Motherboard
  • 2 x Asus HD 4850 in CF
  • OCZ GameXstream 700w PSU
  • Windows 7 Build 7000 Beta x64

 

Benchmarks Used:

  • Cinebench R10 64-Bit x CPU Test
  • CPU Free BenchMark 2.2
  • Fritz Chess Benchmark
  • Geekbench 32-Bit
  • NovaBench 2.0.330.0
  • SiSoft Sandra 2009 SP2
  • WinRar 3.80
  • wPrime 2.0 1024m

 

 

A change of pace from what was seen with the original Phenom CPUs, Phenom II has a fair amount of overclocking headroom. Simply raising the unlocked multiplier on the Phenom II 940 allowed a 24/7 stable speed of 3.8Ghz. This is up from 3.0Ghz and represents a respectable 27% overclock. Also overclocked, the NB speed was run at 2400MHz and the HT Link running at 2200MHz. These results are pretty in line with what has been shown by average overclockers for 24/7 stable settings. One of the best qualities of this CPU is the unlocked multiplier, something which you pay a hefty premium for on the Intel side of the coin. The G. Skill DDR2-1066 was run in “unganged” mode at the rated speed of 1066MHz with stock 5-5-5-15 timings.

The Intel settings were 353 x 8.5 for 3.0Ghz settings with memory running at DDR2-1058 with stock 5-5-5-15 timings. 3.8Ghz speeds were acheived with 447 x 8.5.

 

Rather than narrate, we will let you see the raw numbers and let them speak for themselves.

[img]Cinebench.png[/img]

 

[img]CPU Free Benchmark.png[/img]

 

[img]Fritz Chess Bench.png[/img]

 

[img]Geekbench.png[/img]

NovaBench is a short program that tests three common CPU tasks — Floating Point Operations per second, Integer Operations per second, and MD5 Hashes generated per second. It takes about as long to run as Geekbench and scales well with clockspeed and extra threads (or cores.) In this test we see Intel’s largest margin of victory at 31%. We haven’t seen this bench widely used, and the wide gap in results could be part of an explanation as to why this is the case.

 

[img]NovaBench.png[/img]

 

WinRar is a tool for un-zipping and zipping (or in this case “raring”) files. Measured in kb/s, this built in benchmark tool doesn’t scale with clockspeed as well as others. We see that the Phenom II 940 is able to outperform the Q9550 in this test, possibly due to AMD’s use of an IMC (integrated memory controller).

 

[img]Winrar.png[/img]

 

wPrime 2.0 is the newest version of a widely used benchmarking tool in the “competitive” benching arena. One of the best features of 2.0 is that it automatically detects the number of threads to use, negating any possibility of forgetting to change the thread count and possibly losing your top score of the night. If you ever participate in competitions like this, you will understand the frustration. Opposite to NovaBench’s results above, here we see AMD’s largest margin of victory at a 12% advantage over Intel.

 

[img]wPrime.png[/img]

 

If you have read any of this author’s other reviews, you will know that he and other writers/editors here take SiSoft Sandra’s CPU benchmark results with a grain of salt. Particularly multi-core efficiency tests and power efficiency tests (not included here to avoid any confusion.) We don’t feel that Sandra reports bad results, just that the actual real world value of these results is a bit hard to apply in many scenarios. I will let the numbers do the talking on these ones.

 

[img]Sandra Arithmetic ALU.png[/img]

 

[img]Sandra Arithmetic iSSE3.png[/img]

 

[img]Sandra Cryptography Bandwidth.png[/img]

 

[img]Sandra Multicore Intercore Bandwidth.png[/img]

 

[img]Sandra Multicore Intercore Latency.png[/img]

 

[img]Sandra Multimedia Doublex4.png[/img]

 

[img]Sandra Multimedia Floatx8.png[/img]

 

[img]Sandra Multimedia Intx16.png[/img]

We thought it may be interesting to see if both CPUs scaled evenly when overclocked. Not being 100% sure if they would scale equally based on being from different platforms and using a different approach to making use of system memory, we were pleased with the results. Going from 3.0Ghz to 3.8Ghz equates to a ~27% overclock on the core speed. Averaging the gain in performance in each test due to overclocking, we see that both CPUs scale very well in the benchmarks used. Nearly scaling perfectly with roughly ~24% increase in performance. The difference here is so marginal that it is just within the 3% margin of error allowed in a sample this size to remain statistically significant.

 

[img]Overclock Scaling.png[/img]

 

We conclude that AMD has a winning chip for previous AM2+ system users, that overclocks much better (and more consistently) than it’s predecessor. Performance is not quite what we hoped to see, but is in line with what other reviews have found. Deneb is, clock for clock, equal to or slightly slower than Yorkfield. Rarely beating Yorkfield in this bench suite, but certainly an improvement over the original Phenom. Gaming tests are showing good results for AMD, especially in CrossFire setups with AMD/ATI graphics. Some chipset optimizations are probably to be credited for the boosted performance. If you are an AMD fan or an Intel fan, you should be happy with the results that Phenom II is able to bring to the table. From the AMD side you can now get the best performing, best overclocking quad core ever released by your favorite company, all with the ability to drop into your current AM2+ compatible board. Just make sure to check for the latest BIOS at your board manufacturer’s website and double check compatibility. From the Intel side you would be happy to see that if you are still on the LGA 775 platform, your system will not be consistently beat by anything at the same system cost price point. We should all be happy to see a competitive CPU, as we all know that competition between manufacturers is a winning situation for the end user (us!)

 

Basically if you currently have a Phenom II capable setup and are looking to upgrade, you will do well going with a Phenom II x4 940. If you currently use an Intel based setup and need to upgrade your CPU, you are best off with a Q9xxx series quad. If you are building from scratch? Wait for AM3 results (Due Q2 2009), and build whichever is the best bang for your buck at that point. Hopefully AMD will continue the trend of releasing more frugal-minded Black Edition (denotes unlocked multiplier) processors. Can anyone hope with us for a 45nm “Propus” (AM3) based “budget” Black Edition quad?

 

Pros:

  • Drop in compatibility (for the most part) with current AM2+ motherboards
  • Unlocked multiplier (the Phenom II 920 is locked) allowing for higher and easier overclocking
  • Greatly improved overclocking over Phenom series CPUs
  • Good performance for the money (aka: good “bang for your buck”)
  • For “Xtreme” users only: Has virtually no attainable “cold bug,” even under Liquid Helium
  • Hits very high clockspeeds for quad-core benchmarking goodness under Xtreme cooling (single stage phase, cascade, dry ice, LN2, LHe)

 

Cons:

  • Not worth switching platforms if you currently have LGA 775
  • Doesn’t overclock quite as high as the average Intel quad under 24/7 normal condition cooling
  • Performance doesn’t consistently match Yorkfield, an older technology by ~9 months

Discuss this review here. We welcome your feedback and questions! Please feel free to ask or request any extra information regarding this, or any topic in the forums.

SHARE THIS POST

  • Facebook
  • Twitter
  • Myspace
  • Google Buzz
  • Reddit
  • Stumnleupon
  • Delicious
  • Digg
  • Technorati

Leave A Response