As promised in the closing comments of
part I, Rage3D is back with even more testing
of the new ATi 3870X2, this time abandoning the new-age glitter of DX10 titles in order to explore
the well established maturity of DX9 ones. Stick with us to see what happens when a pair of new GPUs
deals with an old challenge.
DX9 Games
The DX9 games on show today: The Witcher, Call of Duty 4, Unreal Tournament 3, S.T.A.L.K.E.R. and Half Life 2: Lost Coast.
All of the games being tested are patched to their latest version, except S.T.A.L.K.E.R, which was patched to version 1.005, with a version 1.006 released after testing was completed.
Testing Methodology
The testing methodology employed is fairly simple, with 3 major ways of testing utilized:
Irrespective of the method employed, the results you'll see will always be the average of 3 runs (in some cases, this means actually doing 4 runs and discarding the first as it produced results that didn't align themselves with the following 3 due to the loading and caching taking place during the first run). This will be detailed in each game's dedicated section
Details with respect to the particular timedemo/level used for testing will be provided for each of the games being tested with either custom testing applications or FRAPS, in their respective section.
Unless otherwise specified (and there are only two circumstances when it'll happen), all graphical settings are set to their maximum values in the games' menus. This often means going over the highest available preset; we're assuming that those interested in high-end GPUs, like the 3870X2, also want the best possible quality. All tests were run at 1920x1200 with sound enabled.
We'll employ a fixed structure for presenting the data: for each game we'll first look at absolute performance numbers expressed in FPS for both a single 3870X2 and dual 3870X2s, after which we'll look at the benefits of adding a second X2 and going for a Quad CrossfireX solution.
Review System
As in Part I, the review system specs are as follows:
And now, at last, on to the benchies!
Version used: 1.2
Testing method: UT3Bench using the WAR-Torlan_fly timedemo, the results represent the average of three 1 minute runs
The third UE3 based game of the day, Unreal Tournament 3 is possibly the most optimized incarnation of the engine to date, although not quite the most graphically demanding. Still, it's another one of those benchmarks that's never missing from a review, so it most certainly couldn't be lacking here.
The level of AF was adjusted through UT3Bench.

As expected, the level of performance is high. Enabling AF does incur a fairly hefty 13 FPS drop (going from no AF to 16X AF), but framerates remain over a very comfortable 70 FPS level. Since UT3 is framerate capped at 60 FPS (framerate cap disabled for testing purposes), that means you'll have the best possible experience when gaming with the 3870X2. QuadCF doesn't seem to bring much benefit here - shall we investigate?

%Δ performance was calculated using the following formula:
In an unexpected way (sigh, we guess it would've been too simple an analysis if all went according to expectations), scaling is virtually non-existent, even with 16X AF enabled and 1920x1200 resolution used. Confused by this, as everyone else seems to be getting better scaling from UT3, we've asked ATi to verify our numbers. In response, they explained that, without AA, UT3 is CPU limited most of the time and one would see scaling at 2500x1600 and 16X AF. Since neither AA nor a large enough monitor were available to us, we had to accept that we won't be able to induce sufficient GPU pressure to extract benefits from going to a dual 3870X2 configuration. Sorry guys.
Version used: latest available through Steam
Testing method: In-built Video Stress Test, the results represent the average of 3 runs
Given its age, the inclusion of the Source engine in the testing suite might seem surprising. No longer cutting edge, it isn't the most demanding of benchmarks. In spite of that, Source-based titles remain popular, so seeing how the X2 does with one can be quite useful. We’ve opted for the Lost Coast technical showcase because Valve used higher quality assets in it, trying to push the engine as far as it could go. We’re aware that Episode 2 includes an overhauled shadow rendering, but we’re not certain that the assets used are at the same level as those in Lost Coast.

Here we're looking at framerates achieved with varying levels of AF. The game remains CPU limited with dual X2s, and up to 4:1 anisotropy with the single X2. The performance level is very high, which is to be expected considering the aspects outlined in the introductory part. Perhaps AA can make those GPUs sweat:

Well, except for the 8 sample+ Narrow Tent and 8 sample+ Wide tent modes, the single X2 seems to cope with the task of rendering Lost Coast just as well as the Quad configuration, and both of them produce very high levels of performance. If you find the 16X AA results to be anomalous, you're right: currently, 16X AA works only in dual-GPU mode, which means that if you enable it the second 3870X2 in a QuadCF configuration is sitting idle. This is being worked on in ATi's driver labs.
You'll also notice that there are no Edge-Detect numbers. This is due to the erratic behavior we underlined in the AA and AF analysis in the introductory part of the article.
Given the fact that all but 4 scenarios seem to be CPU limited on both cards, there's not much scaling to be expected, but let's take a peek at the graphs nonetheless:


%Δ performance was calculated using the following formula:
No surprises here: only with 16X AF or with 8X AA coupled with either Narrow or Wide Tent custom resolve filters does Lost Coast stress the single 3870X2 enough for it to be outpaced by the dual X2 configuration - everywhere else we're CPU limited. We guess that's what one gets for playing old(ish) games on cutting-edge hardware.
Version used: 1.5 (1.4 for the Single-player executable)
Testing method: FRAPS run through the "Crew Expendable" level, results represent the average of three 3 Minute long runs
This game needs little introduction, as it's been an immense hit. Technically, it's not a cutting-edge engine, but still manages to look nice. Personally, we'd rate it as similar to Source in terms of sheer graphical prowess, but Infinity Ward has to be congratulated for exploiting it fully. As you'll soon see, COD4 is both very optimized and VERY (there's a reason for the capitalization) multi-GPU friendly.
The benchmarking run consisted of playing through the first part of the "Crew Expendable" mission. In order to minimize variation between runs, we tried to stay as close as possible to Captain Price, whilst doing pretty much the same things. Since we've yet to achieve machine status, some variability between runs existed, but, as before, results ended up being quite consistent (if that wasn't the case, we obviously wouldn't have mentioned it, eh?)



In case you didn't believe us before, here's the confirmation: COD4 is a multi-GPU setup's best friend. Both the single X2 and the dual configuration produce very playable framerates but, due to COD4's friendliness towards AFR (the preferred rendering scheme for multi-GPUs), the dual setup shows a pronounced improvement. Increasing the degree of anisotropy doesn't affect performance significantly on either of the tested configurations, with only maximum FPS on the QuadCF one suffering a significant drop due to enabling higher levels of AF.



With 115 FPS average with 8X AA enabled, QuadCF makes a strong case for itself. Again we see 2X and 4X AA being strangely close in terms of performance, and 16X AA seems to behave even more strangely here, with it being significantly slower on the dual X2s compared to the single (remember, currently 16X AA is restricted to dual-GPUs, thus no benefits from Quad-GPUs can be experienced with it on). The Tent Modes seem to incur a hefty performance hit with levels of AA greater than 4 samples, but given the extreme blurring they bring in this title we hardly consider them a viable option.
In the scaling graphs, we'll show you what probably ATi would show you if they were allowed to use numbers from only a single game in their promotional effort for QuadCF:


%Δ performance was calculated using the following formula:
First off, the 16X AA numbers are quite odd (the erroneous behavior was consistent across benchmarking runs, so that's why we actually posted them). The scaling experienced here is somewhat amazing: under certain settings linear gains are seen. This left us a bit baffled as personally we doubted that linear gains can actually be achieved outside of synthetic testing scenarios, so we asked ATi's Will Willis about it. He again confirmed that our numbers were accurate, and that the excellent scaling in COD4 is a result of a torrid love affair between ATi's drivers and the game itself. OK, those weren't his exact words ... what he said that their driver is very solid for COD4 and that the game favors multi-GPU setups greatly. Those interested should be aware that even with this tight partnership, a bug made its way into current drivers: when water quality is set to normal, on certain maps (like the one where the game's tutorial takes place), framerates plummet to about 25 FPS. This didn't happen in the testing scenario we used, but we experienced it in gameplay.
If we think about it a little, it's not that surprising really: the engine COD4 uses is not a very recent one, but an older one brushed up. It's safe to assume that the brushing up process didn't involve going crazy with very recent techniques that make use of persistent resources or other multi-GPU "breakers", and that the load COD4 puts on the GPUs is one that's very easy to distribute through AFR. Summing those up, the results seem less outlandish...but still immensely impressive. If we were to hazard a guess, COD4 will be heavily used in the coming weeks to show the benefits of multiple GPUs, as Nvidia releases its single-card dual-GPU solution.
Version used: 1.0.0.6
Testing method: In-built "demo_play" command using the
ixbt3 timedemo, courtesy of the guys over at iXBT/Digit-Life, results represent the average of 4 runs
Out of all of the games in the DX9 category, S.T.A.L.K.E.R. is perhaps the most interesting, as it's the only that uses an engine (X-Ray) based on deferred rendering (if you recall, UE3 employs deferred rendering only for its shadowing) and since such solutions might become a lot more common in the near future. GSC, S.T.A.L.K.E.R.'s developers, haven't skimped on implementing advanced features: SM3.0 support, parallax mapping, one of the nicest implementations of HDR currently on the market, fully dynamic lighting and high-resolution shadow maps are all present. On the flipside, as with all deferred renderers running in DX9, there is no support for AA, so we'll have only AF to play with.
It's important that we admit we were not entirely happy with the S.T.A.L.K.E.R. testing-initially, we intended to use FRAPS here as well, but due to some mix-up involving my copy of the game (or rather, it's artificially extended "exile"), the tests could only be performed very late in the reviewing process (at least this allowed us to use the 1.0.0.6 patch). Given the above, we opted to choose the timedemo that the guys at iXBT use in their reviews, with future S.T.A.L.K.E.R. numbers coming from FRAPS based testing.



AF is rather costly in S.T.A.L.K.E.R., due to the fact that the game in general, and the used timedemo in particular tend to render large open spaces where there are a lot of textures being filtered. Coupled with only 4 texture units per each RV670 chip, this explains why you lose 10FPS from your average when going from no AF to 16X AF on the 3870X2. Looking at the minimums and the maximums, you'll notice that they're not behaving very "rationally"-this is due to the fact that some inter-run variations exist even with a pre-recorded timedemo, and since minimums and maximums represent (surprise) the single lowest/highest FPS value experienced in a run, they can end up being quite different from one run-through to another. By the way, this also, and perhaps to a greater extent, applies to all of the games we've tested with FRAPS: as much as we'd like to be able to produce perfect repeatability, the fact that we’re human prevents us from doing it. The averaging process should eliminate some of the inconveniences, but as we see with S.T.A.L.K.E.R, that's not always the case. The average framerates were very tightly packed around the value you see above, though, so that shows that the differences in minimums and maximums didn't impact the overall performance in any significant way.
QuadCF seems to be helping everywhere here, and also there are hints of it being CPU limited with no AF and 2X AF - let's look at the synthesized scaling results to get a clearer picture:

%Δ performance was calculated using the following formula:
There are some decent, though not spectacular, gains to be had by going to a dual X2 configuration, although we suspect that a triple CrossfireX setup would produce similar results, and that S.T.A.L.K.E.R. is another one of the games that are clamped to 3 GPU scaling for now. S.T.A.L.K.E.R. has traditionally been a tough spot for both ATi in general and Crossfire in particular, so getting good performance out of a single X2 (which relies on Crossfire to do its thing, make no mistake about that) and scaling out dual X2s is fairly encouraging.
Version used: 1.2
Testing method: FRAPS run through the "Old Vizima" level, results represent the average of three 3 Minute long runs
Just like in the DX10 section, we'll conclude the DX9 section with a game that can easily make the crossover to the next one. The Witcher (in spite of it being an awesome awesome game) is not quite a game you expect to see being benchmarked. It relies on a heavily modified Aurora engine, and though not as feature-laden as UE3 or the X-Ray engine, it follows COD4's example of looking very nice without going overboard with cutting edge rendering techniques.
The testing run consisted of crossing the "Old Vizima" level from the gate to the Swamps to the gate to the Dike, making a sharp turn, going near the large abandoned tower that dominates the map, fighting a Cemetaur and a Graveir with the help of a bunch of elves, coming back to the gate to the Dike and finishing up near the closed-off well in the centre square. We won't be playing the same used-up tune regarding inter-run consistency and variability as you probably know it by heart now.



Respectable numbers generally, with the curse of the single-digit minimums making an unwanted comeback. Little red lights are starting to blink when looking at QuadCF numbers though, and if you're already thinking back to how Hellgate:London behaved, you're not very far off. Let's check out the performance with various levels of AA:



Performance remains OK even with 8 sample AA enabled...if one ignores the horrendous minimums (which can be ignored on the single X2 because they occur only twice upon resource upload, but can't be ignored on the dual X2s-more on that in a bit). The Tent filters seem to incur a significant performance drop when coupled with levels of AA greater than 2 sample-on the single X2, whilst the QuadCF configuration bumps that up a notch, making the Tent filters feasible combined with 4 sample AA. Sadly, this is another game where QuadCF produces an almost unplayable experience, in spite of appearances.
With varying levels of AA

With varying levels of AF

%Δ performance was calculated using the following formula:
With AF going to dual X2s actually hurts performance-since the game itself is single-threaded as far as we can tell, we're probably running into CPU limitations again, and since there are some inefficiencies related to splitting the load to progressively more GPUs, in such scenarios QuadCF will be somewhat slower than DualCF or a single GPU-if they're all limited by the CPU, mind you.
Now it's time to explain the very low minimums-with QuadCF enabled, The Witcher behaves similarly to Hellgate: the framerate is very jumpy, it falls down to single digits only to go up the following moment, and when looking at certain parts of the level, like the abandoned tower it's always crawling...which made FRAPSing through the level quite a pain, trying to navigate these sticking points in the same manner each run. As was to be expected, we've informed ATi of this and they're looking into it (you've probably grown accustomed to this particular phrase by now), but The Witcher is one of the games that underwent little testing in the driver labs. It's worth mentioning that this isn't nearly as apparent with a single X2, although some slight stuttering can be experienced at times even in that configuration.
If it weren't for the above, the scaling experienced with AA is actually encouraging, and we’re quite hopeful that once the above issues are fixed in a future driver the 3870X2 will prove to be a very good solution for experiencing The Witcher in all of its glory.
This concludes the DX9 section. To sum up our results:
With all the information given to you we expect you need a break to digest it all, so we’ll call it a day and end Part 2 here. Don’t forget to come back soon to read the 3rd and final part, dealing with Crossfire and some less than standard game benchmarks!