ADFS 2011 Wrapup & AMD Lynx Platform Tests

Company: AMD
Author: James Prior
Editor: Charles Oliver
Date: July 18th, 2011

Sabine and Lynx Platform Launch

The first mainstream accelerated processing units (APUs) to come to market are in the mobility market. This is a very competitive and lucrative market, and the one AMD decided to attack first. Getting the right balance of compute power between x86 and graphics is key to a good user experience in a portable computer, as is battery life. AMD's A-series APUs, seven of which were launched on June 14th, excel on fronts, offering dual or quad x86 core configurations with 240, 320 or 400 on-die Radeon Cores. The desktop parts are similar, launched on June 30th, with two x86 quad core parts sporting 400 or 320 Radeon Cores.

APU Product Naming

A-series APUs are denoted A4, A6 or A8 with a four digit suffix to show their relative performance and positioning. The graphics cores are still named for their relative position in the Radeon stack, ranging from 6400G through to 6600G (mobility) and 6500D for desktop. Confusingly, an A4 is dual core, and A6 and A8 are quad cores. The numeric value doesn't match with the clock speeds either, as they range through a 3300 to 3500 (mobility) and 3600-3800 range (desktop). There's no immediate name/number to product specification correlation, seemingly trying to compete with Intel with how arbitrary the numbers can be. At least an A6-3410 has higher numbers than a Core i5-2410. For the graphics, you get a D suffix for Desktop parts and G suffix for notebook, to differentiate APU graphics from discrete desktop and mobility graphics parts.

AMD A-series Mobility line-up

AMD A-series APUs are manufactured at GlobalFoundries using their 32nm SOI process. This is a new process for AMD CPUs, using the STARS K10.5 CPU architecture AMD previously had manufactured on 45nm SOI as Opteron, Athlon II and Phenom II products. The CPU architecture in Llano moves on from the most recent STARS design and adds some efficiency and design tweaks like a crossbar interconnect between the cores, and improved schedulers. The four x86 cores featuring 128-bit FPUs each have 128KB L1 cache (64KB for data, 64Kb for instruction) and 16-way 1MB L2 cache. This is increased over the 45nm STARS cores that had 512KB L2 cache per core. Like the Athlon II and Phenom II 800 series products there is no L3 cache. Instead, that transistor budget is used for more useful things, like system interconnects and discrete performance level graphics. AMD estimate a ~6% increase in CPU performance, clock for clock, against the previous generation 45nm STARS cores, although this will vary with application.

AMD A-series Desktop line-up

The big benefit for APU is the GPU's ability to use system ram for processing data without the need to copy the data to another memory store. The GPU inside the APU gets a protected memory location up to 1GB, just like previous chipset integrated graphics solutions did, and memory access is based on software or OS scheduled. Rage3D readers, being discerning high performance enthusiasts, are used to seeing GPUs with memory bandwidths measured in the 100s of Gbps, while Llano uses the CPU memory controller and DDR3 memory common to modern CPU platforms. For the FS1 notebook package, two SODIMMs are supported at up to DDR3-1600 speeds for 25.6GB/s bandwidth and a total of 32GB capacity.

In FM1 desktop package, four DIMMs are supported, increasing capacity maximum to 64GB, and dual DIMM configuration speed increases to 1866MHz for 29.8GB/s bandwidth. Both 1.35v low power and 1.5v DDR3-1333 DIMMs are supported. This results in memory that is lower in bandwidth and higher in latency, but cheaper and expandable - and non-dedicated. At this point it is important to remember that this product is designed to offer mainstream discrete graphics performance in an integrated package at preferential platform cost vs. previous generation CPU and discrete CPU. Two features, supported under OpenCL, are Pin-in-place and Zero Copy and they are the key to improved compute performance. Combined with the APU's ability to simultaneously read and write on different memory channels (four DIMMs required, so desktop only and max speed DDR3-1600) this significantly boosts the compute capabilities vs. a discrete GPU and separate CPU, where the latencies of data copying and non-shared memory locations reduce compute execution and increase complexity.

AMD Llano APU Die

Battery life for mobility platforms was a weak point for AMD until the APU era, and AMD has put a lot of work into creating technology that enables 'all day' battery life. This is marketing-speak, so the literal interpretation is not quite accurate as it's measured as around 8 hours at Windows 7 aero desktop idle battery life. This is still quite impressive, but not as impressive as AMD's claim of being able to watch two full-length Blu-ray movies on a single charge. Hewlett Packard was keen to tell us of their extended life battery packs that can offer up to 26 hours idle with A4 APUs. Real world, 5.5 hours idle on a six cell battery for a 15" with an A6.

All-day Power

Per core power gating enables each x86 core to be powered off (C6) and the package can be powered down by explicit Sleep or OS halt request. Per core activity is monitored digitally to estimate workload and define total chip TDP, creating a temporary TDP budget. This is used by Turbo CORE to boost the clock frequency of x86 cores, and incorporates the TDP used by the GPU cores; if the GPU is active, less Turbo CORE is applied to the CPU. The Radeon cores and UVD 3 are each power gated to save energy when not in use, as well as providing PowerPlay clocks for minimizing power use when active. Llano's power monitoring technology favors GPU performance over CPU; when both the CPU and GPU are active, power will be saved on the CPU cores to give the GPU preference. As we'll see later, this has a knock on effect for overclocking.

Sumo, the APU graphics core

The GPU, codenamed Sumo and resplendent with Radeon Cores, is based on AMD's Evergreen technology and is a derivative of the Redwood design from the AMD Radeon HD 5500 and HD 5600 series. Naturally this is branded as the AMD Radeon HD 6400, 6500 and 6600, depending its relative performance (compared to the mobility series of Radeon HD 6000 GPU's, not the desktop), despite the 'last gen' technology. Partly this is due to longer design and validation cycles for CPUs than GPUs, where the APU had to be tested for many more circumstances and cases than a consumer GPU add-in board would be. These tests delay introduction, meaning while AMD had working APUs at the time the Evergreen series fully introduced to the market (and dominating), they didn't have the ability to sell it yet. The Radeon cores are upgraded from the original Redwood design, but the exact nature and how that compares to the Turks improvements are not known.

Asymmetric CrossfireX, known as Dual Graphics, appears on the Llano and Sabine platforms, allowing a discrete graphics processor (dGPU) to be added to the system for increased performance. As the CrossfireX method title implies, the dGPU doesn't have to be identical to the APU one, but does need to be close in terms of SIMD ratio; around a 1:4 delta in Radeon cores at the most. So an A4-3300 with HD 6480G, with 240 Radeon Cores, can be partnered with a Radeon HD 6500M series card which, if you remember, is a Redwood rebrand itself. So it's the same GPU core partnering with the same GPU core, under the HD 6000 series moniker, with UVD 3 integrated into the Llano die. Bus Alive Chip Off (BACO) allows the dGPU to be powered down for great idle and light load power usage, waiting until the extra graphics or parallel processing power is needed to jump into action. Driver level control of the dGPU allows OEMs and users to set policies on which applications active that dGPU under which circumstances, for fine-tuning battery life and user experience.

Dual Graphics Options

For the desktop platform, Lynx, dual graphics is also supported, in the same manner as on the mobility platform Sabine - an additional card can be added, from Radeon HD 6450 to Radeon HD 6570 and HD 6670. Llano supports DisplayPort 1.1a, HDMI 1.4a and DL-DVI as well as VGA outputs. Two of these outputs can be active at a time; there is no Eyefinity support baked in. This product came to be in a time frame where the success of Eyefinity wasn't well known and to play it safe, it wasn't included. D'oh! The AMD Vision Engine Control Center enables the dual graphics configuration, and AMD recommends dual-channel system memory be installed - without sufficient bandwidth, dual graphics will not engage. The Vision Engine will allow you to select which GPU is considered primary. This serves to allow the user to choose between maximum power savings and Eyefinity through a discrete card, while still permitting dual graphics performance. Select the on-die GPU and use the mainboard provided outputs and the discrete card can go into ultra-low power state (ULPS) when it is not required. Use the discrete GPU as primary and the on-die GPU also goes into low power state, while the dGPU can provide an Eyefinity configuration. The on-die GPU cannot be disabled, which is a shame if you're using a dGPU that isn't compatible with asymmetrical CrossfireX, or it doesn't make sense to use it.

Fusion Controller Hub

The Fusion platforms consist of an AMD A-series APU and a Fusion controller hub (FCH). The APU itself integrates x86 cores, Radeon cores, dual channel memory controller, UVD 3 and platform interfaces including 16 PCI-Express 2.0 lanes, into a 35W to 45W package for mobility, and 65W - 100W for desktop. The Fusion controller hub (FCH) takes over from the SouthBridge and handles SATA (v3.0, 6Gbps), PCI, USB (2.0 and native 3.0) as well as offering an additional 4 PCI-Express 2.0 lanes, in a startlingly low 2W package. There are two variants of the FCH for mobility (Sabine), the A60M and the A70M. The main difference is USB support - the A70M has four USB 3.0, ten USB 2.0 and 2 USB 1.1 ports, while the A60M drops the power-gobbling USB 3.0 ports and gets 2.0 instead for a total of fourteen. For the desktop platform (Lynx) the two FCHs are A75 and A55. As with A60M, the A55 drops USB 3.0 support in favor of USB 2.0, and FIS-based switching on the SATA ports. All the FCH packages are manufactured on a 65nm process, and for the desktop include RAID 0/1/10 support, with a TDP of ~8W. Mobility TDP is around 2W.