Rage3D ATI 3870X2 CrossfireX Review - Part I: The DX10 Games
By Alex 'Morgoth Bauglir' Voicu - reviews@rage3d.com
April 4th, 2008

[ Print ] - [ Close ]

ATI 3870 X2 Crossfire Review - Part I


Introduction

This review is fairly late - at least by today's standards of having a whole slew of benchmarks and a full evaluation ready for every new architecture the second the NDAs are lifted. Whilst there are a number of reasons for that, no one can change the above truth, so we'll have to point out some of the advantages of being late...lest our readership goes to light the torches and sharpen the pitchforks.

First and foremost, the extra time helps in bringing out a better rounded product, exploring all the nooks and crannies that might get skipped under the pressure of releasing an article as soon as possible. Asides from that, the reviewer can get better acquainted with the product under review and see how much he likes it in day to day activities. Finally, since we've gotten the politically correct explanations out of the bag, it's time to introduce the real advantage: the controversy created by the early reviews tends to show the path the investigation should take and what should be a priority. In our case, the recent noise about how "healthy" benchmarking should be conducted has generated a number of fresh ideas.

The History of the HD3870X2

When two little guys become preferable to a single big one

The 28th of January was a significant date in the calendar of most, if not all, GPU enthusiasts. It was the day when, after a complex mélange of rumours, leaked benchmarks and cryptic hints, ATi's (or AMD's, if you fancy saying it like that) comeback to the high-end occurred. In order to grasp why this is significant, and why we are talking of a comeback, a walk down the memory lane is in order (don't worry, it'll be a short one).


ATi's woes started with the somewhat ill fated R6XX GPU family, which was released as the 2X00XT. It was late, hot, power hungry, had less than stellar yields and, most importantly, couldn't properly tackle the high-end segment of the market. Whilst the high-end isn't exactly the most profitable of segments, the general perception created by having a part that rules there tends to trickle down to the lower ones and thus it's important to be at least competitive on that front. Having a big part that was caught in limbo, competing with lower-high-end parts from your main competitor (Nvidia) was not exactly the stuff dreams were made of for ATi, so they got to work on fixing everything that could be fixed, because it was too early for a complete architectural overhaul.

The RV670

The result of those efforts ended up being the RV670: a relatively small 55nm chip that turned out great. It came back from the fab in tippy top shape, thus allowing for an earlier then planned release; the RV670 was cooler then its predecessor, and could be priced very aggressively. While not being a high-end competitor itself, the RV670 sold quite well, and managed to erase some of the unpleasant memories the R600 had seeded. Around this time, down the grapevine came hushed rumours of an R680 part that would reassess ATi as a high-end GPU provider.

The R680

There was much speculation surrounding the R680: it went from being a huge monolithic chip to being a MCM (multi-chip-module: see this for a tad more information on the concept) with two dies on a single package to being Crossfire (see here) on a PCB/on a card, like some AIB made dual 2600XT solutions were (example), or like Nvidia's 7950GX2 (example). The sheer density of these rumours ensured that at least part of them were right, as we'll soon find out for ourselves.

On January 28th, the R680 materialized, proving to be a beastly card: comprised of two RV670 chips crammed on a single PCB and linked by a PLX bridge chip, armed with 512-MB DDR3 DRAM and marketed (quite correctly and decently, in our humble opinion) as the HD 3870X2. As expected, this generated an entire spectrum of reactions, from unabashed enthusiasm to condescending giggles, depending on whether one was an ATi or Nvidia fan . What should be clarified right from the get go is that the 3870X2 is neither the be-all-end-all of GPUs, nor is it some fluke part, like the 2900XT arguably was. As with all things in life, the R680 is neither pure black nor immaculate white: it's a grey! Translation: it's a very good card, with both strong and weak points and, as shall be shown throughout this review, it's a solid high-end contender.

Little Guys and Big Guys

Time to explain the subtitle, as many of you are likely scratching your heads trying to figure out what little guys and big guys have to do with complex pieces of silicon. Whilst 3D rendering has gone from strength to strength since 3Dfx awed us with bilinear filtering (alas, poor point-sampling...for we knew it well), adding more and more power which in turn enabled doing more and more complex rendering work, it's still not at the point where one can say that it's enough to accurately recreate reality; at best, we're around the entrance to the Uncanny Valley in this area (see here in order to figure out what the theory behind the concept is). So graphics power has to continue to scale upwards, thus making chips become larger and larger in spite of being built on progressively advanced process technologies that achieve incredibly small transistor sizes. The trouble with this is that, at some point, you find yourself with all of your eggs in a single basket, with your huge top-end GPU being dependent on the latest, not completely mastered process technology, getting delayed due to unexpected bugs in the silicon and yielding in a completely unsatisfactory fashion. It needn't happen all the time, but when and if it happens, it's a very complex and hard situation to tackle.

An alternative to building constantly larger, more powerful chips is using more, and weaker, chips in tandem, aimed at solving graphics rendering woes. The concept is not new, having been around for quite a while. Over the years 3Dfx, ATi, XGi and Nvidia (quoted in chronological order) have employed it. Whilst it was nice for providing impressive paper specs, and when it worked it did so in an also impressive fashion, the inherent redundancies and inefficiencies of this approach, coupled with the fact that straight chip scaling still had a lot of life left into it, made it less than successful overall. Only in fairly recent times, with Nvidia's SLi (Scalable Link interface, not Scan Line Interleave which was 3Dfx territory) and ATi's Crossfire was a proper foothold established. Both technologies were employed in order to cater to a very select niche who wanted the absolute best performance rather than to create a flagship product aimed at rounding off a product line (the 7950 GX2 could be considered an exception, but it flopped due to a number of reasons and being caught in the wake of the G80), were tied to certain chipsets and, most importantly, required you to buy two (expensive) high end cards for at best a 70% improvement.

A Market in Transition

As GPUs near the one billion transistor mark, the risks are growing and the graphics war is moving towards a more trench-based conflict rather than all-out battle on the open-field. Neither IHV can afford to have another NV30/R600 debacle, and sustaining the accelerated development that we've grown accustomed to solely on the back of bigger and more complex chips might create risks that aren't justified by the possible rewards, so using multi-GPUs to continue scaling graphics performance becomes an increasingly attractive alternative. If you will, it's the inverse of how a product line used to be built. Prior to this, you had the big guy on top who got scaled down for inferior market segments, whilst the future seems to be (at least on ATi's side) starting in the middle and scaling upwards by adding more GPUs and downwards by messing with clock rates/functional unit disabling. Arguably, this should simplify the entire process. Another possible and likely scenario is that we'll have a staggered approach to GPU progression, huge monolithic chips being released with a large interval between them, in gain a good grasp on process technologies and to ensure the squashing of all bugs, with refreshes happening between these releases by means of multi-GPU cards. One way or the other, the future is certainly interesting to say the least.

A Different Investigation

Having established the ground rules, it's time to introduce our hero: the HIS HD3870X2 1GB video card. We won't bore you with a plethora of pictures; the 3870X2 has been pictured to death all across the web, with nudies of the cooling solution, the chips, the PCB and so on being just about everywhere. There are only a few things that have to be mentioned with regard to these cards' physical characteristics:

Summarizing the RV670

Although beyond the scope of this article, a certain understanding of the 3870X2's architecture, which is to say, the architecture of the RV670 twins powering it, could prove useful toward arriving at one's own conclusions. That being said, we'll direct you to two fairly excellent articles that detail it, one by Beyond3D, dealing with the R600 architecture, with which the RV670 is identical save for a number of internal optimizations and a 256-bit bus instead of a 512-bit one, the other by TechReport, dealing directly with the RV670 itself.

If you're not up for a read, I'll sum things up:

With the 3870X2, you take the above and double them, for paper specifications only, as 100% scaling is unlikely to happen outside of contrived theoretical scenarios.

The 3870X2

The 3870X2 differentiates itself from the 3870 with a higher core clock (825 MHz vs. 775 MHz), lower clocked DDR3 RAM replacing DDR4 RAM (900 MHz vs. 1126 MHz) and the addition of the PLX 8547 PCI Express Switch directly on the PCB in order to have a transparent and motherboard agnostic Crossfire implementation. This is as good a time as any to clear something up: all of the X2s currently on the market are PCIE 1.1 parts, because the PLX chip itself is a PCIE1.1 part. It provides 48 lanes, of which 16 go to each GPU and the remaining 16 extend from the card to the motherboard. The mess created with PCIE2.0 was born out of the enthusiasm of some sites who, in their early reviews, quoted a marketing slide which showed some significant improvements with 2.0 over 1.1 that had absolutely nothing to do with the 3870X2 and its typical usage scenarios. For a high-end chip, equipped with 512MB or more of RAM, the benefits of PCIE2.0 over PCIE1.1 add up to something in the interval of [0, 1] %, which you can interpret as unimportant. If you're actually in a situation where a faster PCIE connection matters significantly, you're probably outgrowing your RAM real-estate and thus the cards are using system RAM, which is slow enough for it to be a situation you don't want to be in at any rate, so the point remains moot. ATi's argument for using the 8547 chip is a logical one: they already had experience with it, it had good availability, and was relatively cool and cheap. The 2.0 supporting parts from PLX have only recently entered production and have a price-tag of about $75 and, given the non-existent benefits, it would have made little sense to use them.

ATi HD3870X2
ATi HD3870X2

With ATi design configuration, the HD3870X2 is seen by the OS as a single discrete card, relieving users of messing around with Crossfire enabling or disabling, as it will automatically be on at all times. In a word, for all intents and purposes, the 3870X2 cards are no different in apparent behaviour from existing single-GPU boards. As with all things, a caveat exists: Crossfire is very dependent on both solid driver support and developers' coding. You need both a driver that ensures that you get scaling to the greatest extent possible and a piece of software that doesn't do things that break the aforementioned scaling. The preferred rendering algorithm for Crossfire, Alternate Frame Rendering (AFR), has a fairly simple task: it has each different GPU render a different frame, say GPU1 gets frame n and GPU2 gets frame n+1. The trouble arises when data from frame n is needed for working on frame n+1. In that case, AFR falters and you get performance on par with a single-GPU. This can be worked around by developers, of course, but the take home here is that they need to code with multi-GPUs in mind so that situations like the above don't happen.

What Makes This Investigation Different

This portion of the article was named "A Different Investigation" because there are several things that will be not quite in tune with what you expect to see in a review. Firstly, there are no synthetic benchmarks: no 3DMark, RightMark, or LightsMark...in fact, no Mark is coming in today. The reasoning here is that we feel that synthetics simply don't provide much information to the average user; it's nice to explore instruction rates, texturing rates and such things in isolation, but a game tends to push a lot of buttons simultaneously, with the ecosystem being heterogeneous enough to ensure that no single test can paint an adequate picture of how your card will actually perform. Another aspect is that synthetics are all around the web, just like nudies of the cards, so rehashing what has already been said and done wouldn't be really useful.

Another diverging point was born from the recent "Real life vs. Timedemo" debate that has heated up forums all around, whilst generating significant traffic for those who started it. Since we believe that, generally, the best answer for a problem is found by mixing characteristics of two supposedly diametrically opposed solutions, we'll employ a mix of timedemos and gameplay runs in our attempt to get a good image of what this card (these cards: the plural will be explained a bit further in the article) can do.

Finally, before getting down to counting Frames Per Second (FPS), we'll show you how the shrinkers of little chart-bars everywhere, Anti-Aliasing (AA) and Anisotropic Filtering (AF) look like, how they map out from marketing papers to real-world gaming and, most importantly, why you should care about them beyond their obvious usability as a universal shrink-ray for bar-charts.

Testing: Methods and Setup

DX10 games represent the future; while some would have you believe that DX10 is useless and that DX9 is the only API you'll ever need, this assertion is both misguided and naive. We won't comment on how optimally these games make use of DX10 though; suffice to say these represent first attempts at using the new API. In this category we've included the following titles:

All tested games are patched to their latest version.



The testing methodology employed is fairly simple, with three major methods of testing being utilized:

Irrespective of the method employed, the results you'll see will always be the average of 3 runs (in some cases, this means actually doing 4 runs and discarding the first as it produced results that didn't align themselves with the  following 3 due to the loading and caching taking place during the first run). This will be detailed in each game's dedicated section.

Details with respect to the particular timedemo/level used for testing will be provided for each of the games being tested with either custom testing applications or FRAPS, in their respective section.

Unless otherwise specified, all settings pertaining to graphics are set to their maximum values in the games' menus. This often means going over the highest available preset-we’re assuming that those who are interested in high-end GPUs, like the 3870X2, will also want the best possible quality. All tests were run at 1920x1200, and sound was enabled.

We'll employ a fixed structure for presenting the data: for each game we'll first look at absolute performance numbers expressed in FPS for both a single 3870X2 and dual 3870X2s, after which we'll look at the benefits of adding a second X2 and going for a Quad CrossfireX solution.

Finally, a short description of the system that was used for conducting the tests:

All other drivers were fully up to date.


Crysis DX10

Version used: 1.2

Testing method: Crysis Benchmarking Tool 1.0.0.5, Assault_Harbor Timedemo, the results represent the average of the last 3 runs out of a batch of 4(the first run was discarded)

No GPU review is trendy these days without including Crysis numbers, that much is certain, so we couldn't go against the flow in this respect. Crytek has always been preoccupied with pushing the technical envelope, and as we'll soon see, they've done it again (somewhat). we'll be using the Assault_Harbor timedemo as it is far more indicative of how the game will effectively perform, containing AI, Physics, particle effects and most of the other goodies the CryEngine 2 offers.

Let's start things off by looking at performance in DX10 with Very High settings and various levels of AA (the X64 version of the game was employed):


Crysis Avg AA
Crysis Avg AA

Crysis Min AA
Crysis Min AA

Crysis Max AA
Crysis Max AA

The numbers from 0 to 8 along the horizontal axis represent the level of AA. What jumps out is the very low performance experienced, with both the solitary X2 and its beefier dual incarnation. This is one of Crysis' characteristics-it's regarded as a devourer of GPUs, due to its liberal employ of modern rendering techniques-the game certainly does a lot of work per pixel when using the High or Very High settings, and something to consider is that the Very High settings are supposedly aimed at cards with more than 512MB of VRAM, so this might contribute to the low level of performance. Something else to note is the very low hit experienced from enabling and then progressively going to higher levels of AA. This might suggest that the game is either CPU or ALU limited (we’re leaning more towards the second variant).

With the above in mind, let's see if we can extract more performance out of it by playing with the resolution (the AA tests above were run at 1920x1200):


Crysis Avg Resolution
Crysis Avg Resolution

Crysis Min Resolution
Crysis Min Resolution

Crysis Max Resolution
Crysis Max Resolution

The numbers remain unimpressive: although by going from 1920X1200 to 800X600 we've reduced the number of pixels by a factor of 4, 8, we only gain about 10 FPS on either of the tested configurations, and we still are in the realm of mediocrity with our framerates. Since reducing the resolution didn't help, let's try reducing the quality settings, whilst keeping the resolution constant at 1920x1200:


Crysis Avg Quality
Crysis Avg Quality

Crysis Min Quality
Crysis Min Quality

Crysis Max Quality
Crysis Max Quality

First off, it should be clarified that the Low settings look absolutely horrible, so in spite of them being available we doubt anyone should ever realistically consider using them. The Medium settings are...well, medium, nothing out of the ordinary, and the only settings that make Crysis look like a true jewel of technology are High and Very High. How optimized can we consider Crysis to be when even with the horrid looking and awfully simple Low settings it doesn't go over 60FPS with high-end cards is as good a question as any, but that should be answered elsewhere.

Finally, let's look at the benefits of going from a single X2 to dual X2s in each of the three scenarios above:


Scaling vs. AA
Scaling vs. AA

Scaling vs. Resolution
Scaling vs. Resolution

Scaling vs. Quality
Scaling vs. Quality

%Δ performance was calculated using the following formula:

Performance Calculation
Performance Calculation

Performance Thoughts: Scaling is quite unimpressive, QuadCF affecting minimum FPS in a negative manner in almost all circumstances and its benefits being very pronounced only on the maximum FPS. Even in scenarios where one would expect to get significant benefits from adding GPU power (higher resolutions/higher levels of AA), the improvement in average framerates doesn't surpass 25, 59%. This is mostly due to the nature of Crysis-it's simply not very multi-GPU friendly currently, and its developers seem to have enjoyed using a number of the techniques we identified earlier as being hazardous to multi-GPUs. When questioning ATi on the issue, they also shared that currently the game can only scale to 3 GPUs, due to its peculiarities, and that the driver, based on this knowledge, is clamped at 3-GPU usage as well-basically, the 4th GPU is standing idle when playing Crysis on a Quad Crossfire configuration. Work is underway to remove this limitation, but without some help from Crytek themselves this'll be a tough nut to crack.

Overall, performance in Crysis is nothing to write home about: the game is playable only up to Medium settings at 1920x1200, after which whilst the average framerates remain somewhat acceptable the minimums plummet, or, with indulgence, up to 1024x768 with Very High settings and a QuadCF configuration, scenario in which the minimum framerates are almost bearable. It seems that both driver work on ATi's side, and development work on Crytek's, is required before we can enjoy Crysis in all of its glory at a somewhat decent resolution, with either High or Very High settings.

Call of Juarez DX10

Version used: 1.1.1.0

Testing method: In-built DX10 benchmark, results represent the average of the last 3 runs out of a batch of 4 (the first run was discarded)

Call of Juarez isn't the newest of games, but upon the release of the R600, its engine (Chrome) received a significant overhaul, being updated to include DX10 support, with a number of modern rendering techniques being implemented. As a consequence of this update, the game became quite demanding and entirely GPU limited, as well as a permanent member of benchmarking suites around the web. It is worth mentioning that Techland (the makers of COJ) worked closely with ATi during the development of the DX10 Enhancement Pack, and as such it has received some flak for supposedly favouring ATi. For what it's worth, it does seem to be more closely tailored to the strengths of ATi HW. That being accounted for, here are the numbers:


Call of Juarez Avg AA
Call of Juarez Avg AA

Call of Juarez Min AA
Call of Juarez Min AA

Call of Juarez Max AA
Call of Juarez Max AA

Quite a jump from the tiny framerates we grew accustomed to during the Crysis tests. Again, the numbers from 0 to 4 along the horizontal axis represent the level of AA. AF isn't tested as it cannot be manipulated through the benchmark's configuration utility, and forcing it through the Catalyst Control Centre isn't the best of ideas: with current games employing numerous texture layers, forcing AF makes the GPU perform filtering on all of those aforementioned layers, in spite of the fact that some (perhaps most) of them don't require it, thus creating an artificially high performance drop. The elegant solution is for devs to tag the textures that require AF and to provide a means of controlling AF in-game (COJ the game includes it, it's only the benchmark that oddly doesn't have one), so that the GPU does only the optimal amount of work. This line of reasoning will be applied to all games that lack an in-built AF control (with 1 exception, which will be explained).

The performance level here is quite high, and we seem to be getting good scaling from going to QuadCF (more on that in a bit). AA is fairly costly though, with a nearly 20 FPS drop in average FPS associated with going from 0 to 4 samples. COJ DX10 is one of the games that employs its own custom resolve pass, in order to get HDR accurate AA-this means that here the playing field will be level between the 3870X2 and its competition, as both will do shader-based resolve.

And now, just how good is scaling to the Quad configuration?


Call of Juarez Scaling
Call of Juarez Scaling

%Δ performance was calculated using the following formula:

Performance Calculation
Performance Calculation

Performance Thoughts: As we've already mentioned, after the DX10 overhaul, COJ became entirely GPU bound, so scaling is quite impressive, with only the minimums experiencing a relatively less significant gain. The major difference from Crysis, who by all accounts should be just as GPU bound, is that the collaboration between ATi and Techland seems to have paid off, as the developers appear to have written code that's quite friendly towards multi-GPUs. It's probably safe to assume that the drivers were quite properly fine-tuned for this title, due to this close collaboration. Summing up, after the less than stellar showing in Crysis, both the single X2 and the dual configuration seem to have picked up the pace-let's see if things keep going in a similar vein, or if the excellent COJ performance was only a fluke.

Company of Heroes: Opposing Fronts DX10

Version used: 2.202.0.11

Testing method: In-built performance test, the results are the average of the last 3 runs out of a batch of 4 (the first run was discarded)

Company Of Heroes (COH) was one of the first games to gain a DX10 specific rendering path, with some improvements and features being added through it. Since then, a number of patches and the Opposing Fronts (OF) expansion have come out, so it's safe to assume that the DX10 path in this game is quite mature? Let's verify this assumption:


Company of Heroes Avg AA
Company of Heroes Avg AA

Company of Heroes Min AA
Company of Heroes Min AA

Company of Heroes Max AA
Company of Heroes Max AA

Like COJ before it, COH allows the 3870X2(s) to show some muscle, albeit the minimum FPS figures are somewhat disappointing. The relatively small drop in average framerates associated with going from 2X AA to 4X AA is a pleasant surprise-and we'll see across our tests that this behaviour is not quite singular and that there seems to be a certain preference shown by the 3870X2 for 4X AA.

QuadCF seems to be benefiting performance here, but let's see to what degree:


Company of Heroes Scaling
Company of Heroes Scaling

%Δ performance was calculated using the following formula:

Performance Calculation
Performance Calculation

Performance Thoughts: Whilst not matching the excellent showing that it had in COJ, QuadCF provides solid benefits in COH: OF. Surprisingly though, after seeing progressively increasing scaling by following the 0->2->4 sample AA path, at 8 sample AA the performance added by QuadCF actually diminishes compared to the one added at 4 AA, contrary to what common sense would indicate: since this is an even more GPU bound scenario, scaling should be even better. The fact that with 4 sample AA and 8 sample AA the minimum framerates are actually hurt by QuadCF is also bothersome. These two aspects are probably the result of a combination of factors: first, data has to be uploaded to two more GPUs(this data upload process causes the "hitching" sometimes experienced on first load of a level with multi-GPUs), and thus the minimums could be hitting a lower absolute level, and second, COH:OF in DX10 is another title that's currently clamped at 3-GPU scaling, and probably in need of additional driver work.

All in all, we'd lean towards saying that the 3870X2 did well in COH: OF, both as a single challenger and with a sibling attached.

World in Conflict DX10

Version used: 1.0.0.6

Testing method: In-built performance test, average of the 3 last runs of a batch of 4 (the first run was discarded)

Since it launched World in Conflict (WiC) has been another favourite of testers all around, and for good reason: the engine it employs is quite modern, being multithreaded and supposedly designed with DX10 considered from the get-go, instead of it being a late afterthought. Initial performance through the DX10 path was quite low, but subsequent patches have improved it to a certain extent, although you should keep in mind that development work with ATi DX10 parts started very late(mostly due to the tardiness of the R600's introduction), and that the developers themselves acknowledged that more work was needed to extract optimal performance from ATi hardware. Time to see if the 3870X2 packs enough punch to overcome that limitation:


World in Conflict Avg AF
World in Conflict Avg AF

World in Conflict Min AF
World in Conflict Min AF

Company of Heroes Max AF
Company of Heroes Max AF

For starters we're looking at numbers with varying levels of AF, so you'll have guessed that the numbers along the horizontal axis represent the degree of anisotropy being employed.

AF seems to affect the 3870X2 primarily with regard to the maximum framerates it achieves, with quite a hefty hit being associated with going from no AF to 16:1 AF(33,33 FPS, to be exact) with the averages and the minimums being affected to a lesser degree. Here QuadCF shines, as it doesn't lose much, if any performance due to increasing the level of AF. Again we see that Quad produces lower minimums than the single X2, though. The overall performance level is quite good for a strategy game-will AA change the state of things?


World in Conflict Avg AA
World in Conflict Avg AA

World in Conflict Min AA
World in Conflict Min AA

World in Conflict Max AA
World in Conflict Max AA

It seems like it does: enabling 2X AA causes a greater performance drop than enabling 16X AF on both the single and dual X2 configurations, with 4X AA causing an even more consistent drop. we should mention that enabling AA caused the game to behave somewhat erratically with some benchmarking runs producing lower than expected results, behaviour that only went away with a game restart-in the case of 2X AA. For 4X AA, restarting the game didn't quite fix things on the QuadCF configuration, and this is reflected in the huge drop in maximum framerates-about 1 in 4-5 runs produced results that were in line with what was to be expected(roughly double the maximum FPS), with averages being equal to those you see above. Since it was so difficult to attain the higher maximums, I opted for showing you the value that was the result of averaging the runs which were consistent(the ones with lower maximums). we've talked to ATi about this and the driver guys are looking into it. On to the scaling evaluation:

With varying levels of AF

World in Conflict Scaling AF
World in Conflict Scaling AF

With varying levels of AA

World in Conflict Scaling AA
World in Conflict Scaling AA

%Δ performance was calculated using the following formula:

Performance Calculation
Performance Calculation

Performance Thoughts: With no AA and no AF, scaling is non-existant. Once we get to higher levels of either, some scaling benefits are experienced, but nothing earth-shattering. The maximum framerates seem to benefit the most, with averages being improved only when going to 4:1 or 16:1 AF or 2X and 4X AA. It's safe to say that WiC isn't exactly scaling friendly on the 3870X2...which pretty much lines up with what ATi themselves have stated in their Catalyst 8.3 slides-WiC is another one of the games clamped at 3 GPU scaling, and actually it's shown to be the one with the lowest scaling of all DX10 titles.

Hellgate: London DX10

Version used: 1.35.44.4020 X64

Testing method: FRAPS run through the "Tottenham Court Road" level; results represent the average of three 3 Minute long runs

Here we have a title meant to demonstrate what DX10 brings to the table. It was supported/promoted by both Microsoft and Nvidia, with Nvidia holding an active role its development. Whilst its value as a game are of no interest here, technically it makes use of some rather advanced techniques, including accurate soft particles or complex SM4.0 shaders and, as a consequence, it's interesting to see how the X2 will handle it, and what benefits, if any, QuadCF brings in such a recent title. we should also mention here that this is another title that took ATi cards into account quite late in its development cycle, and that getting access to its DX10 code proved to be quite difficult for the boys in red (or green, although that could cause some confusions), so this might prove to be another uphill battle.

The FRAPS run consisted in going through the "Tottenham Court Road" level and cleaning it of monsters, without making haste, for 3 minutes. It is worth mentioning that one of Hellgate's features is that it randomly generates its levels-as such, each run was made through a different looking map, but performance was very homogenous between runs, and even the choke points (more on them in a little bit) seemed to be placed similarly, so I can safely state that the random generation aspect didn't affect results in this case (with larger maps, the story could be different though).


Hellgate London Avg AF
Hellgate London Avg AF

Hellgate London Min AF
Hellgate London Min AF

Hellgate London Max AF
Hellgate London Max AF

Hellgate has only two variants for AF: it's either OFF or it's ON. Visual inspection suggests that with the checkbox enabled a high degree of anisotropy, either 8:1 or 16:1, is enabled, but the configuration files provide no indication of which is the correct value. The performance level is, shall we say, mediocre, and enabling AF seems to hurt the dual X2 configuration whilst seemingly having no effect on the single X2. Minimums are again a low-point (pun intended) of QuadCF, whilst maximums get the most benefit-yet in a very short while we'll see that for the moment QuadCF isn't a viable alternative for this game. Before that, here are the AA numbers:


Hellgate London Avg AA
Hellgate London Avg AA

Hellgate London Min AA
Hellgate London Min AA

Hellgate London Max AA
Hellgate London Max AA

Things don't change much when testing AA, only the overall performance level is lower. QuadCF still suffers a heavy defeat in the minimum framerate battle, whilst coming back with a vengeance on the maximum front...which, predictably or not, results in only moderate advantages in average framerates. It's time you to look at CrossfireX scaling and for me to explain why QuadCF isn't an option for Hellgate with current drivers:

With varying levels of AF:
Hellgate London Scaling AF
Hellgate London Scaling AF
With varying levels of AA:
Hellgate London Scaling AA
Hellgate London Scaling AA

%Δ performance was calculated using the following formula:

Performance Calculation
Performance Calculation

Performance Thoughts: In the above graphs, what stands out most prominently is the fact that QuadCF negatively impacts minimum framerates in a very significant manner, whilst helping maximum framerates in a similarly significant manner.  Sadly, the second part of the above is hardly reason for joy, as you'll be experiencing the minimums far more often than you'd like. Currently, Hellgate is unplayable with QuadCF: framerates are jumpy, going from low to high then back to low in a very annoying dance. This happens for about 75% of the tested level, and in the "Holborn Station" hub that precedes it. Some jumpiness is still experienced with a single X2, but it isn't nearly as obvious or upsetting, something that's underlined by the formers constantly higher minimums.

Upon noticing this behaviour, we've contacted ATi and they're looking into things, albeit the fact that Hellgate is a fairly hard to test title due to its random levels makes it one of the titles that undergoes less verification in the driver labs.

Bioshock DX10

Version used: 1.1

Testing method: FRAPS run through the "Welcome to Rapture" level; results represent the average of three 3 Minute long runs

One of the best games of 2007, Bioshock packs a graphical punch in its own right, employing the largely used UE3 engine and adding its own DX10 goodies to the mix.  Higher detail shadowing, accurate soft particles and advanced water rendering come to mind as it creates a very good looking art-deco underwater city. An aspect to consider is that, in spite of the aforementioned graphical goodies, the assets used are not of the highest possible detail, be it polygon count or texture resolution, due to the game's console heritage. The FRAPS run takes place in the "Welcome to Rapture" level and begins right after the acquisition of the first plasmid (the cherry-popping episode) and ends in the Kashmir restaurant, where Johnny and his mate get their head-checked. It also involves fighting some Splicers along the way. The play through was kept as consistent as possible by looking at the same things, taking the same path and attempting to fight the Splicers in the same way-of course some variability between runs existed, but, again, results were ultimately consistent.

We also cheated a bit here as, lacking an in-game control over it, the AF level was adjusted through the game's configuration file.


Bioshock Avg
Bioshock Avg

Bioshock Min
Bioshock Min

Bioshock Max
Bioshock Max

Little to discuss here, as the numbers speak for themselves. Performance is very high, and the 3870X2 seems to like Bioshock a lot. AF doesn't incur a huge performance penalty, and the QuadCF configuration actually seems to like high AF scenarios better-this is probably due to the fact that even at 1920x1200 with 16:1 anisotropy we're still CPU limited, in spite of using a 3.2GHz Core2 Extreme CPU. As shown in the following graph, QuadCF seems to be scaling quite well across the board:


Bioshock Scaling
Bioshock Scaling

%Δ performance was calculated using the following formula:

Performance Calculation
Performance Calculation

Performance Thoughts: After Call of Juarez, this is the first time we see consistent scaling, even if not as significant as in COJ, being achieved across all settings and in all framerate types: average, minimum and maximum. With more CPU power, or at a higher resolution, it probably would've been even more significant, as Bioshock isn't that heavy on current GPUs, even when running it through its DX10 path with all bells and whistles on. With such a high level of performance, it can be said that Bioshock is, after COJ, another feather in the 3870X2's cap.

Lost Planet DX10

Version used: 1.4

Testing method: In-built performance test, average of the 3 last runs out of a batch of 4 (the first run was discarded)

The penultimate title in the DX10 lineup is another poster-boy for the new API, and has been used in promoting its benefits quite a few times. The engine that Lost Planet is based on was developed by Capcom with DX10 in mind from the get-go, and the 1.4 patch adds more DX10 goodies to the mix, with a high quality motion blur algorithm, high quality fur and improved shadow filtering being most notable.

Now that we got the good part out of the way, let's deal with what's less likeable about Lost Planet: first, ATi informs us that the game currently doesn't scale beyond 2 GPUs, so there will be no QuadCF numbers shown as they're virtually a carbon copy of the single 3870X2 results. Second, ATi admits that Lost Planet is one of the titles needing significant work on their part, so we shouldn't expect it to produce huge FPS. The second part was to be expected, since the game was developed primarily on Nvidia's G80, and since the strength's of the G8x and the R6xx lines are quite different, making a title optimized for one to run great on the other is not a trivial task.


Lost Planet Avg AF
Lost Planet Avg AF

Above we have the performance with AF, and again we have both positive and less than positive aspects. The good news is that increasing the degree of anisotropy doesn't have a performance impact...the bad news is that the overall performance of the 3870X2 in this game is very low, at least in this author's opinion. It seems that ATi wasn't joking that this was a difficult game for them. Now, some AA numbers for your viewing pleasure (or displeasure):


Lost Planet Avg AA
Lost Planet Avg AA

Performance Thoughts: With AA there is an odd glitch where 2 sample AA is actually performing worse than 4 and 8 sample AA. we've rerun the test, but the numbers came out pretty much the same. We've also checked to see if 4 and 8 sample AA were being properly enabled, and they were as there was a noticeable difference in edge-quality between the two modes. That being said, the performance drop associated with enabling AA isn't all that great, and coupling that with what we've seen before with AF we'd lean towards assuming that Lost Planet is either ALU limited on the X2, or there's some driver glitch at work that hampers performance. All in all, Lost Planet was a poor showing for today's hero.

Gears of War DX10+DX9

Version used: 1.0.3340.131

Testing method: FRAPS run through the "Impasse" level; results represent the average of three 3 Minute long runs

We will finish up the DX10 section with a title that doesn't use the new API to implement any additional effects, and thus the DX9 and DX10 paths can be directly compared-which makes it a good crossover point. Gears uses a newer version of the UE3 engine compared to Bioshock, and is also somewhat more graphically heavy than Bioshock, using higher resolution textures and polygon models, among others.

Gears uses 16:1 anisotropy by default, and is quite intent on keeping those default settings (changes in the configuration file are reset upon running the game); we've decided to respect its wishes and not work around the game's behaviour, this means the results you'll be seeing are results with 16X AF enabled, something to keep in mind when evaluating the results.


Gears of War Avg
Gears of War Avg

Gears of War Max
Gears of War Max

Gears of War Min
Gears of War Min

Ouch. Maybe Gears isn't exactly adept at using DX10. As you can see, the performance drop incurred going to DX10 is nothing but catastrophic. Consider the game's framerate is capped at 64 FPS, and then look at the DX9 numbers where the X2 and the Quad configuration average at around that cap, then wonder what the heck happened in DX10? So did we and we don't yet have an adequate answer.  It seems that the game itself is doing something wrong there, as this reduction of performance for no logical reason (remember, both versions are visually indistinguishable, and the DX10 path brings no new effects to the table) also occurs on parts from Nvidia, although not to a lesser extent.

A small explanation on the last column, the one marked "DX10+AA": UE3 employs a deferred approach in doing shadow rendering, which is incompatible with traditional AA. With the arrival of DX10, however, developers can enable AA in deferred renderers by making use of some of the new features of the API, which is is what Epic did here, allowing for 4X AA in their DX10 path. The performance being abysmal though, it is of no use currently.


Gears of War Scaling
Gears of War Scaling

%Δ performance was calculated using the following formula:

Performance Calculation
Performance Calculation

Performance Thoughts: Ouch again. This is the first time when QuadCF hurts performance across the board. As you might have guessed, we've contacted ATi about it and they're looking into the issues.

If there's any good news here, it is that the DX9 path is very fast on the X2, so currently using it is a no-brainer.  Likewise, with AA support for UE3 games running in DX9 soon to come for Crossfire configurations (currently it's only available for single card configurations, so no X2 loving momentarily), there will be little reason to run the DX10 path. In short, Gears of War: DX9-good, DX10-horrid, DX10+AA-slightly more horrid, and QuadCF currently on its way.

Conclusion

ATi HD3870X2
ATi HD3870X2

With Gears of War, we've finished the DX10 games evaluation of this series and the experience is something of a mixed bag:

With this we wrap up Part I of our HD3870X2 CrossfireX review. Be sure to come back again for Part II, where we take a look at how AA and AF work with the HD3870X2, and stress the card thoroughly with DX9 titles.