ADFS 2011 Wrapup & AMD Lynx Platform Tests



Company: AMD
Author: James Prior
Editor: Charles Oliver
Date: July 18th, 2011

Llano x86 Testing

We used SiSoft Sandra 2011 to determine performance results and differences between configurations, both stock and overclocked, and using different discrete GPUs. We ran the Sandra Overall score tests, plus the CPU and GPU cryptographic modules. For overclocked configurations, the APU is running 150MHz base clock and 23.5x CPU multi, and 775MHz GPU clock, with DDR3-2000 9-10-9-24 2T RAM.

Arithmetic Performance

APU stock with standard memory speeds, plus overclocked APU (3.5GHz/775MHz DDR3-2000).

Processor arithmetic performance is unaffected by memory speed. Negative scaling the with APU overclocked indicates the power management is capping x86 clock speed. This is likely because the extra voltage and clock speed of the GPU is keeping the GPU TDP high, which has preference over CPU performance even when no load is applied to the GPU cores.

Multi-Media Performance

Results of the Sisoft Sandra Processor Multi-Media, .NET Multi-Media and Video Rendering tests, for stock APU with different speed memory and O/C (3.5Ghz/775Mhz + DDR3-2000) and Dual Graphics configuration.

Again, negligible effects on CPU performance by varying memory speed. Overclocking tanks performance as the TDP cap kicks in. Video Rendering performance increases with more clock speed, and Dual Graphics; 101% increase for Dual Graphics vs. APU 6550D with DDR3-1333 memory.

Memory & Video Bandwidth

Results of the SiSoft Sandra 2011 SP3 memory bandwidth test.

Video memory bandwidth scales nicely with increasing RAM speed, with the biggest incremental gain being moving from DDR3-1333 to 1600 (+15%), and from 1866 to 2000 (+16%). The overclocked APU with DDR3-2000 increases memory bandwidth by 46% over the stock APU with DDR3-1333Mhz.

Using Dual Graphics and discrete GPUs we can see how the APU memory bandwidth compares:

Replacing the onboard HD 6550D with a HD 6670 with GDDR5 nets a 169% increase in video memory bandwidth vs. the stokc APU; to be expected as it moves to a dedicated 128-bit memory controller with low latency high speed GDDR5. Using the HD 6670 in Dual Graphics mode shows an increase in video memory bandwidth also, 117% better than stock but 19% lower than the dGPU alone. Overclocking the APU closes the gap a little, only 5.5% behind now, and increases the lead over the overclocked APU with an increase of 74%.

Interestingly, the 5550 with GDDR5 trails the pack here, despite also featuring a 128-bit memory interface and low latency high speed GDDR5. It's still 107% faster than the stock APU, and 41% faster than the overclocked APU. The HD 6770 as dGPU is a negligible 1% more than the overclocked APU + 6670 Dual Graphics setup. The 256-bit equipped Radeon HD 6850 records a stonking 249% increase over stock APU, 139% more than overclocked APU, 60% increase over dual graphics, and 30% more than overclocked APU Dual Graphics.

Cryptographic: AES256

First we compare the stock APU at different memory speeds, with the O/C APU (3.5GHz/775MHz DDR3-2000)

The 5 SIMD GPU outperforms the four x86 core CPU in the APU in the test, by about 68% regardless of memory speed. When overclocked this increases to 200%: clearly the APU power sloshing is restricting x86 CPU core performance in preference to GPU performance. Higher performance in this test is not due to increased memory speed but instead GPU engine speed.

Dual graphics and the APU CPU with a discrete GPU was tested, with memory at DDR3-1333 speeds. For O/C the APU was clocked at 3.5GHz/775MHz DDR3-2000. CPU results are omitted other than to show they are the same as previously recorded; ~438MB/s for stock CPU, ~333MB/s for O/C CPU cores.

Dual Graphics increases throughput by 153% over stock APU GPU, and 89% over our overclocked APU GPU scores. Overclocking the APU again tanks CPU performance, but increases GPU performance, with an increase of 125% against overclocked APU alone and 19% more than stock Dual Graphics configuration. Overclocking the CPU cores and memory but underclocking the GPU, while running the HD 6670 as single GPU gains 11.5% over the overclocked APU alone.

The APU at stock with HD 5550 as GPU nets an 8% gain in performance over stock APU performance and a decrease of 19% compared to the overclocked APU. The dual graphics configuration is ~133% higher than a single stock APU, and overclocked APU Dual Graphics is 179% higher than stock APU. The HD 6770 is 80% faster than the overclocked APU, and 140% faster than stock APU, but 5% slower than APU Dual Graphics with HD 6670. Only the Radeon HD 6850 is faster than APU Dual Graphics, by 16%, an increase of 195% over stock APU alone.

Cryptographic: SHA256

Stock APU with different memory speeds, vs O/C APU (3.5GHz/775MHz DDR3-2000):

The four x86 cores are not memory bandwidth limited for this test. The HD 6550D is 350% faster with DDR3-1333. Moving to DDR3-1600 increases throughput by 4%, with DDR3-1866 a 7.5% increase over DDR3-1333. Overclocking the APU once again cripples CPU performance as all the TDP budget is given to the Radeon cores; luckily that power is put to good use in this test.

Two results are odd here, the dGPU 6670 and dGPU 6770. Both these configurations underperformed for their specifications. This anomoly is likely the result of an application/driver issue, but was repeatable in our testing. The APU Dual Graphics result is within 5% of the overclocked APU result. Overclocked APU Dual Graphics is 25% faster than overclocked APU alone, and 85% higher than stock APU. As before, overclocking the APU reduces x86 performance. Using a DGPU HD 5550 GDDR5 is about the same as running the APU at stock, with DDR3-1333 3% slower and DDR3-1600 1.5% faster. With dGPU HD 6770 we'd expect to see similar performance as Dual Graphics. The dGPU HD 6850 is 65% faster than the second fastest configuration here, and 205% faster than the stock APU 6550D alone; an order of magnitude faster than the best x86 quad core result, in fact an increase of 1280%.

Power Consumption

Using a Kill-a-watt power meter we recorded total system power draw for three use cases, for different conditions. Idle represents Windows 7 idle with aero enabled, MS Office 2010 Word and Excel running and a PDF document open in Adobe Reader X. DiVX playback is during watching a fullscreen 1080p DiVX HD video in Cyberlink PowerDVD 10 MkII with AMD Picture Pefect enabled. 3DMark11 is recorded during 3DMark11's GPU Test 2 running with Extreme settings at 1920x1080 resolution.

The results highlight that our test system PSU is way over specified for our setup. A performance system using Llano doesn't need a 600W powersupply, a quality 300-450W unit is going to be plenty even with a discrete GPU and several additional devices. At these load levels the Zalman ZM600-HP is about 80% efficient, lower than most newer supplies. This means that power draw may be artificially inflated vs. real world with a more appropriately sized and more efficient modern power supply.

In these results we can see that the power containment features absolutely favor GPU performance. Overclocked, our 3.5Ghz CPU gets lower results in CPU limited circumstances, as the overclocked GPU is eating up the power budget. We believe that this is due to the overclocked GPU core being unable to reduce power to give TDP budget back to the CPU cores which, thanks to the boosted vCore, are running with more power than stock at lower clock frequencies. Dropping the APU GPU core clock to 300MHz didn't give us back any CPU performance, indicating that we are at the TDP wall, although this might be a limitation of the beta BIOS.

The APU GPU responds very well in these tests to more core clock speed and more memory bandwidth, as was expected. For A-series APU builds, the sweetspot would appear to be DDR3-1600 with decent timings like 8-8-8-24 1T. DDR3-1866 would be a waste of money, as while it does scale performance for code running on the GPU, it's off the bang/buck curve that this platform is aimed at.

Clearly, this is not AMD's enthusiast and high performance platform, so it's suited to general desktop and media consumption duties. There are a few mainstream applications that use the GPU to accelerate processing, most notably video and image processing applications but also Microsoft Office 2010, plus HTML5/WebGL/Flash enabled web browsers. Office 2010's GPU acceleration relies on DirectX 9 features, so just about anything can help on that front, but the more GPU horsepower you've got then the better the performance increase - and Llano outclasses every other integrated GPU solution out there with ease, and probably including most platforms with a competitor's 65W CPU and another competitor's ~35W dGPU. The other applications are decidedly consumer oriented, except for video editing which certain departments in some organizations will use right now. This means a lot of the potential of Llano is yet to be tapped, we're waiting for software lifecycles to catch up and take advantage of the up to 500GFLOPs of parallel compute power in Llano APUs (which is more than 4x that of Intel's more expensive Sandy Bridge).

For business desktop and productivity use, AMD's A-series offers a powerful platform. Four full x86 CPU cores, 400 Radeon Cores, UVD 3 and a TDP of 100W is a well-rounded and comprehensive feature set. The display output options are varied, and A-series APUs offer dual-display configurations natively. It's very disappointing that Eyefinity from the motherboard isn't supported, ultimately it could have been the key to mainstream adoption of the technology even if not for gaming purposes. It's possible to run Eyefinity using a dGPU, but that negates the power advantage the platform had, even if you do gain Dual Graphics CrossfireX.

Storage

Results of the SiSoft Sandra 2011 SP3 disk performance using the Corsair F-120 SSD, with different memory speeds and O/C (150MHz bClk, 3.5GHz CPU/775MHz GPU, DDR3-2000 memory).

No change to performance when using different RAM speeds until we hit DDR3-2000, which indicates that likely it is the bClk adjustment that is bumping up performance here, by 3%.

It is disappointing there is no SSD-caching feature, not even the most basic form of mirroring writes to a mechanical drive and using the SSDs for reads. Six full speed 6Gbps is plenty for most configurations, and software RAID 0 or 1 is useful. Gigabyte's BIOS offers granular hotswap control on the SATA ports, and permits two ports to be used as IDE mode independently of the rest. Overclocking and memory speed appear to have no effect on SATA drive performance.