The portent of the end of days is here: we've actually managed to finish our 4870X2 review! Kind of ... it's probable that the story behind the delay will be slightly more thrilling than the review itself - which doesn't mean that the 4870X2 is a disappointment! On the contrary, it's undoubtedly a great card, stable and fast, which equates to being boringly good most of the time. It's performance is very much in-line with that which you saw with the Crossfired 4870s, excepting the scenarios where the added VRAM allow it to leap in front. But we're getting ahead of ourselves, let's first handle introductions properly.
Meet the 4870X2

What we have here is the finishing touch to what can safely be considered ATi's most impressive, and successful, product line in history. It's aimed at those that care solely about the best performance, the nutters that get QX processors, piles of RAM, large LCDs and other such toys ... you know, guys like us. Whilst the 4870 managed to be surprisingly competitive here, it couldn't truly outmatch its higher-priced opponents overall; the X2 takes care of that though.
If you're familiar with the 3870X2, there are few visual and even technical surprises in store for you here. The board more or less looks the same, although removing the cooler shows that a certain amount of component shuffling happened. Also the 10.5” PCB is black, a departure from ATi's traditional red color-scheme. You'll note there aren't pictures of this patched around this section, seeking to do mean things with our bandwidth ... the net is already quite flooded with 4870X2 nudies as it is, adding our own would hardly have contributed any meaningful information.
PLX 2.0
What the 4870X2 has going for it, as opposed to its predecessor, is the upgrade that the PLX switch has received: it's now a PCI-E 2.0 part, effectively doubling the bandwidth theoretically available. This should cover certain situations where the GPUs were limited by the PLX port; resource upload comes to mind, as that was one of the points where an X2 solution didn't fare all that well. Be aware that we're primarily talking about the initial resource upload process that takes place prior to effectively starting to play (textures, Vertex and Index Buffers etc). If you're moving data amounts that are large enough to saturate the bandwidth that the old bridge provided as you're playing through a level performance will be crappy anyhow, so the improvement there would be from crap to less crap ... of course, this is prevented by the increased VRAM, but we'll get back to that in just a bit.
Sideport Interconnect
This is as good a place as any to discuss the rather mystical Sideport Interconnect. This part of the X2 has heated quite a few imaginations in the interval prior to the launch of the card. After the launch, it caused a number of hissy fits. Why is that, you ask? Simple: it's not enabled in current drivers ... so when something that was supposed to change life, the universe and everything else gets disabled, conspiracy theories and imaginative reasoning spring up like in a “Lost” episode.
The less glamorous truth is that changing the world as we know it was never really on the Sideport's “to-do” list. Summing it up, it's pretty much an extra PCIE 2.0 link that interconnects the GPUs directly, without going through a bridge. This path has a slightly lower latency associated with it, compared to the PLX one, but not to a notable extent. The bandwidth it provides is insufficient for doing really adventurous stuff like a shared memory pool. So is it completely worthless then? Like all things, that depends: it will not be all that helpful when doing typical AFR, but it could be useful in alternative schemes. We'd urge you to go through our interview with Mr. Eric Demers for slightly more information about the interconnect (and a number of other interesting topics). It's possible to enable the inter-connect in current drivers, although until ATi decides to use it there's no benefit from doing that (and no, we won't detail how to do it ... not that it's some big secret and arcane ritual, mind you).
2GB GDDR5? Oh My!
One final bit of novelty is the rather awe-inspiring amount of GDDR5 that this black-terror sports: 2 GB, split evenly and fairly between the 2 GPUs, each of them having its own 1GB pool to do nasty things with (remember, no shared memory pool this round). This should cover the increasing number of cases where 512MB becomes less than sufficient - this bit should have a few panties up in a wad, based on the reactions this opinion, expressed in another article of ours, generated. So, let's slowly and carefully untangle the underwear, shall we?
The misunderstanding in this case stemmed from seeing the aspect of memory management in a binary suck/rock key: if it does not suck it rocks (in other words, if I'm not forever pegged at 1 FPS, then it's all good ... my averages are still quite good). Well, that's not exactly how things are: you should see things in a [suck, rock] continuum, with quite a few intermediary steps in between. The basic idea is that in an ideal scenario, you'd upload all the stuff you need for rendering the current level/cell/whatever (depends on how the developer opted to partition his gameworld) to VRAM at once and be done with it. This happens is if all the resources you need will fit in there. Once developers start piling on the shadowmaps, normal maps, treasure maps (joke alert), high resolution textures , whilst using multiple render-targets, it becomes an increasingly tight fit. Add a high enough resolution coupled with a decent amount of AA and odds are your GPU will face the same problem that Oprah faces when choosing a skirt: it's too small (the VRAM, that is).

What happens now? Well, magic! Okay, not really: all is fine and dandy as long as the GPU doesn't need something that's not in VRAM. If this occurs, the driver must evict something that's already there, to make room for the new resource. A LRU (least recently used) scheme is employed, in which the resource that was accessed least recently (sic) gets flushed. What you'll perceive is that your framerate will go down. For how long? Well, that depends on how much data needs to be evicted/uploaded - and herein lies the source of the conundrum! Since most games today are really aimed at 512MB SKUs (at best/worst, depending on what your stance is on graphical advancement), it's likely that their requirements at extreme settings won't significantly surpass what's available (if they surpass it at all); in translation, the occurrence of the whole enchilada outlined above won't be very frequent. The more requirements surpass available VRAM, the more frequent the shuffling. In a worst case scenario, you'd have to flush and repopulate the entire memory space every few frames - but at this point you're likely to have given up on the game or reduced your settings.
So, if you've been paying attention (probably not, as no one actually reads this stuff in practice), what the rather lengthy paragraph above says is that average framerates can still be decent even when VRAM constrained - they're not what you should be looking at. It's the minimum framerates that will invariably suffer. The averages can and will be affected, but the extent of this is highly dependent on just how frequently the driver starts doing its balancing act. Ultimately, VRAM requirements in a game depend also depend on the level/chapter/whatever being played. If one level needs X amounts of VRAM there's no guarantee that the next won't need twice that. The only way to get an idea of what's going on is to monitor VRAM allocation ... and even that's a bit tricky under Vista.
Considering all of the above, the increase of 512MB to 1GB makes sense given the typical usage patterns that these cards should see (resolutions higher than 1900x1200, high levels of AA, highest possible in-game settings), as well as the trend towards higher and higher VRAM consumption that's noticeable with more recent games (:cough: Crysis :cough:). It's more of a forward-looking thing at this point in time, you can count the games where it makes a difference today without having to borrow a hand/ relying on mucky feet. The coming months might force you to employ at least your toes for that count though. Oh, and we hope no one is upset over the above paragraphs ... they're tongue-in-cheekish to the extreme, because the best way to present info is the friendly neighborhood joker way.
Asides from all of this hubris, there isn't much else differentiating the 4870X2 from a pair of 4870s. Frequencies are the same - 750MHz Core/900MHz Ram - install is painless. Oh, we nearly forgot: the X2 is “smarter” when it comes to thermal/power management. It has a full Powerplay implementation, meaning that it downclocks and downvolts the GPU cores, as well as downclock the memory. The 4870 implements only the most basic Powerplay level, merely downclocking the GPU core, without touching voltage and RAM. We'll (shamelessly) plug our content once more here, and direct you to our interview with Mr. Demers for more elaboration on the topic.
So now comes the part where we show you pretty charts, right? If only it had been that easy...
When FedEX delivered the samples at 1PM on the 11th of last month, it was a rather happy moment. After all, with an epic crunch, the possibility of having the (a) review up in time for the NDA lift existed ... perhaps (it's worth noting that we had had access to the review drivers and had done the 4870CF groundwork). So with great haste did we proceed to unbox and plug the (heavy) cards into the eager PCI-E slots of our main testing-rig.

Now, as many of you might've noted, there's a strong phalic motive associated with the 4870X2 (just look at it's cooler) ... unfortunately, the Abit IX38 motherboard we use was less than impressed with that message and decided to be an upright girl - it refused to work with the cards! Thanks lass, good time to set some boundaries. After a reasonable amount of tinkering, we got the couple to uneasily coexist by setting jumper switch #2 to ON (it's located on the back of the card under the crossfire connector, and not all cards seem to have it, although both our samples did). This jumper controls whether or not the PLX bridge chip enables PCI-E 2.0 (setting it to on disables it, counter intuitively enough). Alas, this was not enough though: semi-randomly losing the video signal is not nice, and when writing a review it's actually a killer. After exchanges with ATi and Abit, it became clear that Abit needed to update the bios for the motherboard ... when that will happen, if ever, is up to anybody's guess (the alternative of using the version 12 bios exists, for those having the board and looking for a present, rather than future solution).
At this point, we had missed the NDA lift. As a consequence, we opted for doing a few new/extra things around this review. And made what can, in hindsight, be considered a less than wise choice.
Meet the Phenom
Since a lot of people (including ourselves) were curious just how a Phenom would behave when teamed with the graphics hardware and put through our array of tests, we decided to build a Phenom rig for this review ... since we had a bit of extra-time on our hands anyhow. Of course the sane choice would've been to simply grab another Intel motherboard and do the review on our usual setup, but sane choices look good only in retrospect.
We'd like to tell you that the Phenom adventure was as nice as playing with Nicole Scherzinger's nicely rounded and tanned bottom ... sadly, it was more like giving a big wet kiss to Tyra Banks' currently (that might've changed) flabby posterior. You'd still do it, but hardly with the same enthusiasm, and the memories would be not quite as ... satisfactory.

The first Phenom that we bought was a 9850 that suffered from a very bad identity crisis: it refused to function as itself and could only behave when clocked as a 9750, anything above made it quite unstable. This caused another week of delay as we tried to solve it. Finally, we applied the best solution: we went out and bought another Phenom, this time a 9950. Providence smiled on us as this one was actually a good chip, that even overclocked to 2.8 - this is what you'll see used in the review. We have a forthcoming 790GX investigation that will detail the Phenom woes/experience, for now knowing that it was a rather bumpy ride should suffice.
What is relevant in the context of this review is that we had to go from a 3.6GHz Core 2 Quad to a 2.8GHz Phenom, and the Core 2 was already becoming a limiting factor at that frequency - it doesn't take all that much thought to realize that the Phenom pretty much smashed us into the wall of CPU limitation head-first. As opposed to what some will tell you, and as you'll soon see for yourself, the CPU is hardly irrelevant in today's games, and even at high-resolutions/quality settings. There are still many things that the CPU has on its plate, and many places where it can become the bottleneck, especially so when using fast GPUs.
As a consequence of the above, we have deferred testing Quad Crossfire to a later date, when we'll have a functional Intel rig, or we'll extract a hefty overclock from the Phenom(s). For the time being, showing QuadCF numbers would've meant simply copying and pasting the single X2 numbers, since the CPU couldn't keep up with the task of feeding 4 fast GPUs (yes, we're factoring the fact that scaling is sub-linear with AFR).
Now, without sharing any more details you're likely not interested in, here's the deal:
Trivia about the big black monster:
the cooling solution is dual-slot, and modeled after a certain part of the anatomy of a fellow named digitalwanderer - ignoring this tidbit, it does a good job (the cooler, not the fellow), but tends to become rather noisy as soon as the GPUs start sweating; it is quite silent at idle
due to the complete Powerplay implementation, idle temperatures are significantly better for the X2 compared to its 4850/70 brethren, however, don't pluck the card out immediately after ending a longer session - your fingers will not appreciate it at all
just like the 3870X2 before it, the R700 reference PCB is 10.5” long
the card needs one 6-pin and one 8-pin power connector
A chunk of copy pasting now, so that the authour (sic) can have his coffee:
We've moved from having a large heterogeneous mix of in-game settings to testing according to 3 exact presets:
BASELINE: This is the game set to its maximum quality settings, but without any AA or AF enabled
HIGH QUALITY: Same as above, but 4X AA gets enabled alongside 16X AF
EXTREME QUALITY: Game settings remain at their maximum respective value, but AA gets bumped to 8X, whilst AF remains pegged at 16X
CROSSFIRE EXTREME QUALITY: Still maximum in-game settings and 16X AF, but we're using the Crossfire exclusive 16X AA mode - this mode can only be forced through the CCC and thus will only work in games that support such forcing; we're investigating it since it probably is of interest to quite a few people out there
For games that have no support for AA, including Stalker, Timeshift, Bioshock DX10, and Gothic 3 in this investigation, the baseline setting has 16X AF enabled
Most in-built benchmarking utilities have been relinquished in favor of FRAPS runs - whilst we most certainly don't consider this to be the be-all end-all of testing, it's probably a better way of showing how the games tested will actually perform in practice
We've also moved from averaging 3 three minute runs with FRAPS to averaging 6 of them - this should help remove some of the inherent variability associated with FRAPS testing
Unless otherwise specified, all tests are run at 1920x1200
For each game, a graph will be presented to you showing the percentile increase of card A vs. card B - the percentile increase is calculated as (A-B)/B and expressed in percentage points
Here are the specs for the system we used during testing:

Since the R700 is pretty much two 4870s on the same PCB, with a side of extra VRAM, logic dictates that it'll end up being equal to that configuration unless the extra memory is needed, in which case it'll end up equal or faster when it comes to average framerates (read the memory management chunk), but will have higher minimum framerates.
Most games will illustrate equality, and, as a consequence, we'll not discuss them directly in their section, but rather as part of the conclusion (the premise of saying “As the graphs show, both cards are equal” repeatedly is quite unappealing really). We'll only comment when there are relevant aspects to outline.
Quoting Elvis, a little less conversation, a little more action!
Version used: 1.2 X64
Testing method: FRAPS run through the first part of the Assault level, the results are the average of 6 three minute runs
We're controlling anisotropic filtering in-game with the r_TexMaxAnisotropy console command. Be advised that surfaces to which POM is applied won't receive any AF, and since those form the majority, AF won't be very noticeable in-game. AA is controlled through the game's menu.



Crysis retains its dual “angel-demon” nature: it's an angel because, being an application that has its Very High preset aimed at cards with more than 512MB, it allows us to illustrate how and when the 4870X2 will make sense, whilst also being a demon due to its CPU-bound nature. The last part should be detailed a tad, so that angry “Morunz, Crysis is the GPUz-rapist, no CPUz needed, yo!” reactions don't show up.
Crysis is an application that's demanding on a system level, rather than placing emphasis on a singular subsystem - it's GPU demanding, gulps quite a bit of VRAM, system RAM, and CPU cycles as well - look at the Draw Primitive count per frame in outdoor scenes (indoors it's much better, probably due to the lack of foliage since the DP count seems to be tied to the foliage system).
We had the GPU and RAM angles covered, but the Phenom wasn't quite up to the task ... heck, even the QX9650 would've needed some extra Mhz (read a few hundred of em based on some testing we did internally at a previous point in time) to ensure that no CPU bottleneck manifested itself. So the picture that you see painted isn't entirely accurate, as the X2 would've had additional performance reserves across the board, whilst the 4870CF could have been faster with no AA (the other settings create VRAM limitation issues). Keep this in mind for future encounters with Crysis.

Version used: 1.1.1.0
Testing method: FRAPS run through first part of the second sub-level of the first episode, the results are the average of 6 three minute runs
AA and AF levels are controlled through the game's menu.



One slight note here: you'll notice that for the Extreme settings there are only 4870X2 numbers - a driver glitch prevented the 4870CF configuration from working properly: framerate would be stuck at 2-3 FPS. ATi seems to have adjusted the way it handles memory management a tad in these latest drivers, so this is probably the source of the issue. A single 4870 worked correctly, and we've already reported the issue and it's being taken care of.

2.300.0.24
Testing method: FRAPS run through the entire Wolfheze mission (the first mission you play with the Germans), the results are the average of 3 playthroughs
We're controlling AA and AF through the game's menu.




Version used: 1.0.0.8
Testing method: In-built performance test, average of 6 runs
We're controlling AA and AF through the game's menu.



Version used: 1.18704.70.4256 X64 (Latest single-player patch)
Testing method: FRAPS run through the "Tottenham Court Road" level, results are the average of 6 three minute runs
We're controlling AA and AF through the game's menu.



Version used: 1.1
Testing method: FRAPS run through the "Welcome to Rapture" level, results are the average of 6 three minute runs


Version used: 1.4
Testing method: FRAPS run through the first level of the game, results are the average of 6 three minute runs
AA and AF are controlled through the game's menu.




Version used: 1.2 and 1.1
Testing method: FRAPS run through Damascus, results are the average of 6 three minute runs
AA only goes up to 4 samples. We're controlling both it and AF through the game's configuration file.


We opted to do two interesting things with Assassin's Creed:
we moved our testing runs to Damascus, attempting to explore a more GPU bound scenario (as well as look at a more realistic in-game load, as you'll be spending more time in the cities than in Masyaf)
Since the odds of Ubisoft actually re-enabling 10.1 support in AC are probably equal to the odds that the writer of this review will miss http://www.azcentral.com/ent/celeb/articles/2008/08/21/20080821fox.html in full HD resolution (odds=-infinity), we chose to show you how the game behaves in its initial, 10.1 enabled state- 20% extra performance with AA is noaice, is it not?

Version used: 1.0.3340.131
Testing method: FRAPS run through the "Impasse" level, results are the average of 6 three minute runs
Gears of War got demoted to DX9, in order to allow for more setting granularity (AA higher than 4X becomes possible), so ignore the DX10 mention in the settings screen-shot, it's redundant. We hope this change will be one you appreciate.
8X AA appears to be less than ideal for 512MB cards, albeit to be honest those minimum framerates are also affected by what the application is doing with regards to its resource-streaming, so they should be higher than they actually are on both candidates.





Version used: 1.3
Testing method: FRAPS run using the UT3Bench with the WAR-Torlan_bot timedemo, results are the average of 6 three minute runs
We're controlling AA through the CCC and AF through UT3Bench's menu.





Version used: whichever Steam decided was the latest
Testing method: FRAPS run through the rocket launch part of the "T-Minus One" chapter, the results are the average of 6 three minute runs
AA and AF levels are controlled through the game's menu.





Version used: 1.7
Testing method: FRAPS run through the "Blackout" level, results are the average of 6 three minute runs
AA and AF were controlled through the game's menu, with the mention that for enabling 8X AA we used the r_aaSamples console command after which we triggered a reboot of the video subsystem in order to apply it.





Version used: 1.0.0.6
FRAPS run through on the “Cordon” level from Sidorovich's basement to the military blockade at the railway, results are the average of 6 three minute runs
Stalker is a deferred renderer, so no AA for it until the Clear Sky expansion lands later this year. We have 16X AF enabled for the tests.


Version used: 1.3
Testing method: FRAPS run through the "Old Vizima" level, results are the average of 6 three minute long runs
AF is controlled through the game's menu, whilst we use the CCC to force the differing levels of AA.


With recent drivers, ATi has removed the option to force AA through the CCC, since that could possibly cause overflowing VRAM on 512MB when forcing high AA levels at high-enough resolutions. The game's developer obviously considered the probability of this quite high, since the level of AA exposed through the menu is conditioned by the amount of RAM a card has and by the chosen resolution. At 1920x1200 512MB cards have only the 2X AA option, whilst on the 1GB 4870X2 4X AA is allowed.
Considering the above, it's obvious that no more 8X AA numbers can be shown, and that High Quality performance is only explored on the 4870X2.
Version used: 1.6
Testing method: FRAPS run going from Gotha, passing by Montera on to Silden, following the road, results are the average of 6 three minute runs


Version used: 1.02
Testing method: FRAPS run through the opening level of the game, "Arrival", results represent the average of 6 three minute long runs


Version used: 1.0.0.1
Testing method: FRAPS run through the first level after entering the Al-Khali complex, results are the average of 6 three minute runs
AA is controlled through the game's menu, whilst AF is managed through its configuration file.



Let's focus on the 8X AA numbers: we see a hefty benefit from the extra 512MB per GPU when it comes to minimum FPS, but a non-significant increase in averages. This is simply due to the fact that at the settings we're using, Jericho goes only slightly over what a 512MB framebuffer can hold, so memory management takes place only for very short intervals (little data to flush/upload), not enough to affect the averages. However, the hitches are annoying in practice, so the experience with the 4870X2 is better, in our opinion.
It would (will, perhaps) be interesting to see at how things pan out at 2560x1600 and 8X AA- it's safe to say that the gap in minimums would be wider, whilst averages are more than likely to be more detached from one another.

Version used: 1.1.0.0
Testing method: FRAPS run through the "San Francisco Grand Prix B" race, results are the average of 6 three minute runs
The game supports AA up to 8X through its menu . AF is forced through the CCC :



We'll close today's “event” with Racedriver GriD. As you'll recall, this was the game we used to exemplify the 512MB “dilemma”, when running it with 8X AA. As you can see, with the extra RAM the 4870X2 goes through this troublesome combination without much thought. Disregard the 4870CF numbers though: as we've already mentioned there's a driver bug involved. However, take a peek at the minimum FPS the single 4870 (single cards aren't affected) achieves to quantify just how helpful extra VRAM can be at times.

This was probably the most difficult part in terms of writing our 4870X2 review. It's hard to draw many conclusions since we lack a clear image of things ... the Phenom wasn't the best choice for this review obviously. Performance of the X2 is hardly surprising, we basically knew how it would perform at release the second we finished testing the 4870CF combo, a while ago. The trouble is that we're not making the GPUs sweat enough due to the CPU limitations. As we said above, the CPU still has a lot left on its plate and the size of the portion increases the faster the GPU(s) become: filling the command buffer for a 4870X2 is hardly easy, since the GPUs tend to burn through their tasks quite rapidly. With this in mind, there are a few points to be made:
the X2 is the fastest card at this point in time
our experiences with it have been quite good, and the drivers were in rather good shape - there are still some edges left to polish, but nothing really obvious and we were sure to discuss things with ATi
what's better, a 4870CF configuration or a single X2? Our humble opinion is that the X2 is preferable due to it being a single card, having a better thermal profile than two 4870s (we're considering stock cooling here), and the fact that you're more likely to find 1 6 pin and 1 8 pin connector on your PSU, as opposed to 4 6 pins - the last one is not a deal-breaker
irrespective of what interested parties might tell you, a fast CPU is still required for high-end gaming - consider that when setting up your new PC
these cards are aimed at high-quality gaming, meaning high resolutions, high AA/AF and best in-game settings - keep that in mind as well, lest you be disappointed about the fact that Solitaire is just as fast on an X2 as it is on an IGP
In our subjective evaluation, the 4870X2 is an excellent card ... but we're probably its target audience anyhow. Some of the things we've heard muttered behind closed doors keep us quite interested in future developments (sorry, we can't detail further). However, this is our evaluation - we absolutely urge you to look at more reviews and arrive at your own conclusions based on a data-set that's as comprehensive as possible. Having said that, we'll be back soon (and this time we mean it!) to round things up nicely by looking at QuadCF when paired with a beefy CPU. Until then, have fun!