Company: AMD
Authour: Alex 'Morgoth Bauglir' Voicu
Editor: Charles 'Lupine' Oliver, Eric 'Ichneumon' Amidon
Date: June 24th, 2008


The RBE were another primary focus for improvement, mainly in order to increase AA performance, another R600 weak-point. Gone is the dedicated alpha/fog unit-it's use was rather questionable in terms of benefits anyhow. Depth and stencil capacity is doubled, which means that the RV770 is actually capable of Quad-Z/Stencil rates, without AA (the R600 was capable of Double-Z/Stencil). We've tested this with Archmark 0.50:
A few things to note here. The RV670 has a higher Color-fillrate due to its higher core clock(775 vs 625). Most other fillrate testing apps, asides from Archmark, fail to expose Quad-Z for the RV770 in no AA scenarios, producing erroneous 2.5X-2.6X rates(too low to match what the architecture should be capable of, too high for a Dual-Z solution). On the other hand, Archmark tends to overestimate Z-Only fillrate for the RV670 (that's why the little asterisk is there). Stencil-only numbers are in line with what both architectures should be achieving. Correlating these numbers with the ones we'll be showing you in just a bit for AA scenarios, we'd lean towards considering the numbers Archmark produces for the RV770 correct.
Z and Stencil compression rates are the same from the R600, so still at 16:1 with no MSAA and 128:1 with 8X MSAA.
Having gotten that out of the way, time to tackle the AA question, one that has been asked repeatedly. As is obvious from Mr. Hartog's presentation, the most significant RBE improvements are centered around this area: the RV770 handles 4AA samples per clock and is thus capable of outputting 16 pixels per clock even with 4X AA, whereas the R600 could only manage 2 samples per clock, which resulted in it outputting only 8 pixels per clock with AA enabled. This means an overall doubling of fillrate with AA compared to the R600. As a bonus, non-AA fillrate is also doubled for FP64 (16 bit per component) color formats going from 8 pixels per clock to full 16 pixels per clock.
Many of these improvements are tied to the redesign of the Color Blender block (CB), which took away unnecessary functionality and allowed the aforementioned doubling of AA rates. Here, the main design focus was on significantly increasing performance rather than on reducing area.
We checked to see how color and Z fillrates vary with number of AA samples using Mdolenc's Fillrate Tester (seems to produce accurate results under such circumstances):

The contrast with the RV670 is fairly striking-we should be expecting very impressive AA performance under real-world conditions, based on what we're seeing here, The behavior of the RV670 with 2X AA is surprising though as it should in theory be capable of taking 2 AA samples per clock. Looking at the Z numbers, we can see that the RV770 actually does Quad-Z whilst the RV670 is only Dual-Z.
The sampling patterns for AA remain the same, which you can see in our AA&AF investigation. We'll be having a more in-depth look at image quality quite soon, but for the moment you can check those out and know that the RV770 uses the same. Note that DX10.1 actually allows the application to control the sampling pattern being used if they so choose. We'll also be looking at Edge-Detect CFAA, which has received significant improvements in terms of performance and compatibility- before that, try it in your DX9 games, you'll probably end up liking it quite a lot.
A final point to touch upon is resolve-is it fixed(was it broken?)? Well, to the glee of many, the fixed, "box" resolve is handled by dedicated hardware in the RBEs. The dedicated HW can run at full rendering rate, but tends to be memory bandwidth limited as it's a fast, simultaneous read/write operation in memory. Instantaneous resolve rate is also affected by the amount of fragmentation within a pixel, but the average number of fragments over all pixels is almost always very close to one.
Moving on to ATi's proprietary CFAA (yes, this is a proprietary implementation that relies on cooperation between dedicated hardware in the RBEs and in the Shader Core), it also has been overhauled on the RV770 compared to the R600, with performance being much better. For the moment, CFAA is a DX9 and OpenGL affair only, with DX10 support available very soon.
content not found









