Lost Planet: The Battle Ground — When CPU and GPU Collide

Overall Score

 

While benchmarking for a comprehensive battle of the clocks between Intel’s best and AMD’s best one benchmark, a gaming benchmark, offered up a nice opportunity for a side track and it is important enough to dedicate time and energy to writing up the information. That benchmark is Lost Planet: Extreme Condition. The game provides a built in performance test which scripts two scenes, one called Snow and another called Cave. During the benchmarking runs the progression of the data made it clear that the Snow script was capable of being throttled at the GPU where as Cave was clearly CPU limited. In and of itself, this is not so interesting; however, what was interesting were the number of resolution options available from the lowest (640×360 – an odd WxH combination) to the highest for the LCD monitor I am using (1280×1024). In fact, the game provides some 25 different resolution settings over the span. This made available the opportunity to observe, with a high degree of resolution (not to be confused with screen resolution), the behavior of the CPUs and GPU under carefully matched conditions. 

The data presented here should be studied carefully because several points fall out of the observations and from scouring the web in which other reviewers used this benchmark to evaluate CPU performance several odd anomalies can be explained. Overall, though the experiment was to compare and observe one CPU vs the other, this particular game and benchmark provides for a different insight into benchmarking with games in general and what is more important, the GPU or CPU. Other sites provide similar information, but not quite at the detail that will be shown here. The upshot of which can be summarized as follows: a gaming system is best determined by the strength of the GPU with the choice of CPU such that the computational performance for non-graphical code is sufficiently good enough as to not hold back the game from achieving greater than some minimal threshold frame rate. Only in this condition would it be possible to ensure a proper gaming experience. Frankly, testing CPU/GPU combinations to encompass all possible scenarios within the gaming environment is, for all practical purposes, impossible. Nonetheless, someone (reviewers throughout the web) gives it their best shot and, for the most part, provides at least some information that helps make the appropriate decisions.

What will be shown are two cases in a game in which, ultimately, the choices are not always clear and what is mean by this is that depending upon the scene, the complexity of the rendering, the action occurring on screen, the GPU can be less critical than the CPU and vice versa. Finally, when compiling the data and putting together something that can be analyzed and discussed, it is clear that the choice of conditions and the choice of scene can vastly impact the overall conclusions depending upon what component one is attempting to evaluate. In short, be wary of sites that provide only one set of conditions and draw generalizations from their data sets.

The test setup for all the data utilizes profile 1.1, the entire software platform, OS details, and system installation is documented here . The important details are as follows, system memory, software versions were all the same. The benchmarks were ran on an nVidia 8800 GTX with the nVidia system driver set to ‘Let Application Decide’, though it has not affect the performance slider was set to balanced on both systems – that is, default driver install settings. The OS for this study was Windows XP Home edition so all the data falls under the 32-bit category running in Directx 9.0c mode, this is important to note.

The Phenom BE was set to auto for all CPU related parameters in BIOS which yields a stock CPU clock setting of 2.5 GHz and a northbridge frequency of 2.0 GHz. The ram, of course, defaulted to DDR2-800 CL5 per the SPD and each system was verified to have the same memory settings. It has been shown in some cases that running the memory faster can affect Phenom performance, in Lost Planet this just a few frames per second (by far the most heavily impacted game-bench is FEAR). The QX9650 was set to 2.5 GHz by selecting 7.5 as the multiplier and leaving the FSB at stock conditions of 333 MHz.. Doing so effectively under-clocked the CPU in order to match clock for clock the Phenom 9850 BE. While this is not the optimum performance situation for either CPU, it is a far more guaranteed situation free of problems than attempting to overclock the 9850BE to the QX9650 speeds (though this test is in the plans for the future). Running the QX9650 under these conditions results in a CPU which is not commercially available, so any comparative analysis between the two CPUs is strictly in the interest of comparing the clock for clock the IPC capabilities of an AMD Phenom to a 12 M Intel Yorkfield quad core. Since cache size is a factor in the IPC determinant, the relative clock for clock comparison is sane… though a commercial CPU with 12 M of cache will also arrive at higher clocks, as such the potential for a clock scaling article exists, depending upon the interest. Both processors were run a standard stock VID voltages.

The Lost Planet game was set to minimum settings on all graphical options except effect quality (which was determined to have significant CPU component) and filtering, these two options were set to high, a screen shot of the graphics settings page is shown below.

 

Running the benchmark is straight forward, from the main menu the resolution would be set by going to the Settings / Video Graphics page and changing to the desired resolution. No other processes were running in the background except MS Paint which would stay open to paste screen dumps. The resolution would be applied, and the Performance test selected from the main menu. The benchmark will loop indefinitely, so the loop was allowed to complete once. Once completed an instance of CPUID would be started and a screen captured generated. The CPUID/screen capture was accomplished in such a fashion to ensure that the average FPS report for Snow and Cave were not refreshed and that the number represent a clean running system without the overhead of starting up CPUID. After the screen dump was saved, CPUID was closed and the benchmark loop exited back to the main menu where the next resolution setting was selected and the process repeated. All available resolutions were tested except one, the option just after 800×600 which offered no real benefit to the data set. The resolutions at 640×480, 800×600, 1024×768, and 1280×1024 were collected 3 times (data which will be used in future articles) to estimate the run to run noise which amounted to 2-4 FPS more or less.

Screen dumps showing the actual data and authenticated with CPUID runs can be found here in our gallery. The image name is created to be explanatory and follows the format
_Speed_Multiplier_Bus_

_Memoryspeed_Memory Timings_Benchmark_resolution_run number

 

QX9650 – 2.5G Screen shots
Phenom 9850BE – 2.5G Screen shots

 

Data analysis is straight forward; the plots are generated as average FPS for the respective scripts vs total pixels rendered, which is simply a product of the resolutions WxH. The load on the GPU is directly proportional to the number of pixels that must be shaded and, under GPU constrained conditions, should be a linear relationship between total pixels rendered and average FPS observed.

The concepts are all straight forward, or at least they should be to anyone who engages in this hobby with any degree of seriousness. Gaming code is a complex interaction between rendering the scene, calculating collision physics, tracking characters, monitoring character health, establishing AI. The GPU is currently the rendering work horse, storing textures in video memory, receiving pixel and vertex information based on point of view perspective, and literally painting a 2 dimensional image transposed from an imaginary 3 dimensional space. Collectively we call this the eye candy. The CPU is traditionally being used for physics, determining collision boundaries, running AI and such. During game play, the rate at which the graphical presentation shows itself to the end user will depend upon one or the other component completing their task in a timely manner. Ideally, the rate that the GPU will render the image would not be influenced by which the CPU can calculate the position and actions of objects within the 3D environment and feed that information back to the GPU. This however is never the case, either the CPU is waiting on the GPU to finish a frame in order to provide the next frame of information or the GPU is waiting on the CPU in the same manner.

In the forums you will hear GPU throttled or CPU throttled, or the GPU is the bottleneck etc. etc. Within this extreme when the GPU is the determinant component establishing the observed frame rate the term GPU-limited will be used, and when the CPU is the frame rate determiner the term CPU-limited will be used.

Conceptually, it is not difficult to understand when and how this will be the case… in the limit of low resolution and no extra or fancy post image processing, the GPU has a very light load and the possibility (even probability) is the CPU will be the limiting factor, where as the opposite is true as the resolution increases. The data presented on the following pages is intended to provide examples and the behavior of both GPU and CPU limited cases. So let’s begin.

The results in figure 1 below show the behavior of Snow and Cave scripts on Lost Planet on a QX9650 at 2.5 GHz. Starting at 230K pixels (which is the 640×360 resolution case) Snow progresses linearly downward to the final rendered pixels of 1310K pixels (the 1280×1024 resolution). This behavior illustrates a clear GPU-limited benchmark such that increasing the load on the GPU results in lower FPS on average. The corollary of this is shown in the Cave results, which remains flat across all resolution settings, yielding a result of about 91 FPS for the QX9650 at all resolution settings. Furthermore, the total pixels rendered are nicely linear with observed FPS, as expected.

 


Figure 1

The CPU limited case is further illustrated in the Phenom 9850 BE results, in which case Snow remains at or near 150 average FPS up to the total pixel loading of 655K pixels (1024×640 resolution), in which case the GPU now becomes the limiter and increasing the load further responds with the appropriate drop in average frame rate. As in the QX9650 case above, Cave remains flat and even at around 77 FPS throughout the entire resolution range which demonstrates the CPU limitations on the rendered scene.

 


Figure 2
Head to Head Snow

The data is now re-plotted (figure 3 ) to show the head to head, clock for clock comparison of the Phenom 9850BE compared to the QX9650 at 2.5 Ghz. The data set is particularly clear; the Phenom 9850BE will limit the 8800 GTX under these conditions. Between 230K and 655K pixels rendered, the 8800 GTX is fast enough to show no response and for the Phenom 9850BE this is a CPU-limited condition. The QX9650, on the other hand, is capable of maintaining a GPU-limited condition in this region of the settings.


Figure 3

Interestingly, as the resolution increases and the number of pixels approach the 1 million pixels rendered mark, the Phenom is able to support slightly higher frame rates on average. The reason for this is not immediately clear, but I would speculate that it is related to AMD’s superior interconnect technology or better implementation of PCIe 2.0 specs which is responsible for sending large quantities of data to the video card. In cases where larger data sets are needed to feed the GPU, bandwidth considerations could play a factor. This is purely speculation; further experiments would be needed to reach any form of conclusion.

Invariably comparisons will be made between these two CPUs, and using gaming data to ascertain the architectural strength is sketchy at best. The most that one can pull from this data set is that in the lowest resolution regime, where the CPU should make the most difference, for Snow in Lost Planet (and Lost Planet only), the QX9650 is roughly 28% faster clock for clock, though this is under estimated since the data for the QX9650 never shows a CPU limited condition. However, at higher resolutions the Phenom is showing about 9% faster clock for clock which under the GPU limitations is suspect to being more due to something other than the intrinsic architecture of the CPU.

Like Snow, I provide a re-plotted chart of the Phenom 9650 vs the QX9650 at the same clock speeds. In both cases, Cave is CPU limited. There is no indication that increasing load on the GPU results in any response to the observed frame rate. In this case, the QX9650 retains on average a 14 FPS lead over the Phenom 9850 BE, which amounts to a 18% clock for clock performance advantage.

 

The data set is relatively simple and straight forward, though it was time consuming to gather. There are several key points that need to be made. The first, most obvious is between the CPUs themselves. Architecturally, at the core, the Intel core architecture is significantly faster at running game code for Lost Planet, clock for clock. The benchmark is run and setup specifically to remove any discrepancy between the CPUs and the results are indeed very clear. However, 640×360 is not a typical gaming resolution, in fact 17” and 19” LCD monitors are now commonly the norm, and 22” and 24” are gaining popularity as they drop in price. In the Snow case we can see the platform subsystem (speculatory) is helping the Phenom maintain some reasonable competitiveness. The extent would need further investigation, but the data in this particular scrip/run is clear. Only in scenes, such as Cave, when the CPU is certainly the limitation in all cases does the Intel core architecture really show it’s significant advantage. One may translate this into the better overall minimum frame rates, but the data set does not support this conclusion at this time.

The second point of the data set revolves around what is more important, the CPU or GPU in a gaming platform. Again, the data set is clear, in complex scenes such as Snow the GPU is critical in ensuring a good gaming performance. While Cave is not GPU limited, one only needs a CPU sufficient to maintain > 60 FPS (or to some 30 FPS, I prefer to have at least as high or higher than the refresh rate of the monitor). Using games to evaluate the worth of a CPU is and should be secondary; though I understand completely why games are weighted so favorably in review sites data – their audience is composed primarily of gamers.

This brings me to the third point, and perhaps the most important – in evaluating the capability of a processor there are two ways to ask the question. One way is ‘which processor is computationally superior’ and the other way is ‘how does each support a realistic gaming experience’. These questions often get muddled in the context of most reviews seen around the web, too many times do these reviewers utilize games as the metric yet move to the GPU-limited regime then conclude on the capability of the processor. It is all hogwash in that regard, yet it is also necessary to show that one using a CPU for a 1900×1200 run at 4x AA of Lost Planet will not be hampered by a slower running CPU. That is until such a case provides a situation where the GPU out runs the CPU, here the argument of ‘future-proofing’ is a good one. Nonetheless, my conclusions about the QX9650 vs the Phenom 9850 would be different if I took only the Snow script and ran it at 1280×1024 or higher resolution, which of course is more a commentary on the GPU’s limitation on Snow rather than the CPUs ability to crunch. Thus, I warn anyone evaluating review data to understand the consequences of how the reviewer chooses to setup the benchmark. Ideally, a good reviewer would show both conditions and allow the reader to make up their own mind concerning the data and which philosophy one chooses to follow.

So again, in summary, it is clearly the case that the Intel CPU is the better CPU at crunching gaming code when considering only gaming code. Clock for clock the race isn’t even close. However, if you are a Phenom owner or plan to become a Phenom owner, the real world answer is it doesn’t matter if we take the Lost Planet data and generalize (which is not kosher in this case, but I will be provided a comprehensive 25 Game compare shortly to show this is true). In other words, a Phenom paired with a nice GPU will run all games with as good a gaming experience (at typical played resolutions) as an Intel CPU, the CPU only needs to be strong enough to support playable frame rates. As such and with a strong emphasis on objectivity – AMD fans should not be offended by what is shown here and sporadically demonstrated across the web, the Phenom is a fine CPU for a gaming task.

Comment/Discuss in our Fourms

SHARE THIS POST

  • Facebook
  • Twitter
  • Myspace
  • Google Buzz
  • Reddit
  • Stumnleupon
  • Delicious
  • Digg
  • Technorati

Leave A Response