AFDS 2012 Day 4

Author: James Prior
Editor: Charles Oliver
Date: June 29th, 2012

AFDS Day 4: Thursday, June 14th

The final day of the AMD Fusion Developer Summit 2012 was Thursday, June 14th and consisted of three keynotes in the morning and departures in the afternoon. The morning keynotes were markedly less well attended due to varying departure times out of SEATAC, which is a shame as there was some great content from the three presenters.

Before the keynotes was breakfast where we bumped into two very interesting people - our old friend Mike Houston of AMD, and a guy from Japan with a name I didn't catch who was wearing a Sony badge. Mike Houston presented at last year's AFDS, spilling the compute dirt on the then unannounced Graphics Core Next architecture with his co-architect, Mike Mantor. Mike and Mike are responsible for large swaths of the awesome in the GCN designs, and Mike Houston was back at AFDS keeping an eye on things as he's been moved into heading up a future APU design team. While on one hand we mourn the loss for discrete graphics, for APUs this can only be a great thing. We didn't get to converse extensively with the Sony gentleman, but he and his cohorts (AFDS attendees sport a variety of interesting company badges, from Sony to Intel to IBM and more; didn't see any NVIDIA badges though) were extraordinarily interesting in optimizing code for APUs. I wonder why?

Keynote #1 - Amr Awadallah, Cloudera

The first keynote of the final day was from Cloudera, explaining how they are dataholics and need Hadoop for their fix of information. Apache Hadoop is a new way to approach data, enabling (in Amr's words) them to 'jump in the data and swim in it.' This is enabled by two key differences from structured database storage systems, the underlying storage network architecture and the manner in which data is processed.

Typical data storage systems store their data in compute nodes which leverage storage networks for redundancy, reliability and throughput. Nodes are replicated to maintain access, and storage networks incorporate redundancy for reliability. The data is formatted before being entered into the database, you've got to know what you've got before you've got it and know what you're going to do with it, with each dataset being identified into relationships, attributes and indices. Superfluous data is discarded and new views of relationships are expensive to build. Structured data is fast on reads but limits the questions you can ask and is slow to adapt and change. Data storage is a problem, and old data is sent to archive where it effectively dies, being cheap to store but expensive to retrieve and process.

Unstructured data is different, in that it's stored on the basis that you don't know what's useful, so you grab as much information about non-obvious ordered events as you can. In this model, storage fabrics aren't robust enough to provide the throughput that is required - the compute is too far away from the data. Cloudera systems put the processing next to the storage by using systems with 12, 16 or 24 disks in a chassis. They then took the opposite approach of Virtualization, and make their nodes into one giant filesystem. Using Hadoop they apply a schema-on-write approach which parses the data when it's read. Each data consumer can have their own view, which changes with the data, and allows any question they want to ask to be put forward.

Hadoop scales without bounds (except financial), there is native scaling without a need to repartition or change architecture - it just gets faster as you add, and has 1/10th to 1/100th the cost/Gb of traditional systems at large sizes. Hadoop is a freight train to structured database's sports cars, small queries take longer on Hadoop than on traditional clusters but large queries can be accelerated from days to minutes. This allows data informatics on a scale not previously considered, where all aspects of a customers interactions can be analyzed and used for improvement. Consider all the information associated with MMOs, a theoretical 50Gb per day of data about player movement and environment info could be generated. Not only can Cloudera Hadoop systems capture that data, it can process it fast enough to make storing it actually possible and worthwhile.

Keynote #2 - Phil Pokorny, Penguin Computing

Following Cloudera was Penguin Computing, a top tier sponsor of AFDS and a company that gave AMD a big scare after the announcement of AMD Llano APUs - they put them in servers. Consumer fusion products in a server? What!? Sandea secure labs were looking to build their 'Teller' cluster and Penguin Computing like to do things differently than the commodity OEM guys, so boom - there's a heterogeneous compute cluster a full year ahead of AMD's Trinity-based Opteron FirePro APU.

Penguin Computing analyzed trends and found that on average there's disparity between core doubling and memory doubling - cores double every 28 months on average, where memory doubles every 34 months. This is a problem as for large jobs needing large amounts of cores and memory to feed them, checkpoints - saving what's been done so far - put massive strain on the disk subsystem. Phil is not very enthusiastic about the traditional server design, saying there is a lot of 'unnecessary crap in there' when referring to modern OEM server boxen.

Phil also had a very cool live demonstration of AMD's power management technologies on a Trinity APU based server. Using a pocket oscilloscope they could monitor the power feed into the chip, and then adjust the load in real time to see how the load varied under different artificial workloads. This lead to very interesting real-time graphing showing how as the CPU and GPU inside the APU were subjected to workloads, the power monitoring applied in different ways to keep the whole packing inside it's thermal envelope. AMD's Turbo CORE and PowerTune technologies are in their infancy but are already effective, especially in a full server platform with smart firmware, in tuning performance to customer levels - if you don't want the published TDP of the chip, but prefer a different number, you can tune that in the BIOS. Over several hundred boxes, adjusting an 80W chip to 72W might not make any difference to performance depending on your workloads, but makes a significant difference in power and cooling cost.

Penguin's call to arms for their keynote was to get changes in the hardware platform. Ditch SATA and get SAS on the mainboard, inside the chipset. Ditch card slot GPUs and get them socketed in the same package as CPUs or APUs so that they can fully customize the mix of scalar and vector processing happening in the box for the customer, tunable and upgradable as time goes by. Penguin is a very customer focused company, shipping their clusters in crates already racked, stacked and ready to go to customer site; plug into the network and power and start validation, not unpacking manuals. It will be interesting to see how much response AMD can provide to Penguin's requests as some of their ethos definitely meshes with AMD's stated visions, but they've got to overcome the ancient inertia of AMD's internal departments to get what they want.

Keynote #3 - Mark Papermaster, AMD

The final keynote of the summit was by AMD's Chief Technology Officer, Mark Papermaster. The keynote began with a summary of the big announcements that happened at AFDS, and demonstrations of the technology heterogeneous compute are bringing to life, such as professional graphics products running on APUs, new rendering techniques for games (AMD Leo demo), and biometric security. A nice demonstration of the power of HSA came through a demo of a software product from Fluenda called Moovida Universe, which provides a gesture controlled media center for your media content known as Moovida Universe. The software is free to download and is optimized for AMD A-series APU's but works on any hardware. The demonstration showed full arm gestures controlled the PC to navigate the library and play movies, with the explanation from AMD's Phil Taylor that on an HSA platform the gestures would be smaller due to the increased processing power that would be available.

Mark also laid out some fuzzy timelines for the next iterations of the 'big core' x86 processor family, Bulldozer, with Stearmoller core designs appearing in 2013 and Excavator in 2014. The 'little core' x86 was also addressed, with Bobcat evolving into Jaguar and a 28nm process - AMD will be all 28nm soon, from both TSMC and GlobalFoundries. Kaveri is coming in 2013, a new APU with more HSA-like features and using Steamroller cores but no news on what GPU architecture inside. Kaveri will fit in very low power range with 15-35W models for use in ultra-thin notebooks but also scale up into mainstream notebook and some desktop applications. This will be the first APU with virtual shared memory, really advancing the HSA model and enabling the HSA software infrastructure. Kabini is a 9-25W APU that uses up to 4 Jaguar cores, completely focused on perf/w at low power. A new sub-5W tablet focused APU is also coming, Temash. This is the follow up to Honda, which is due to replace the Brazos based C-50 this year for Windows 8 AMD tablets, so we're looking two gens out from the current stuff. This gives AMD a lot of overlap from tablet through mainstream notebook, the biggest slice of the market and the current growth areas.

On the server side, the first generartion Bulldozer architecture Interlagos chips will be replaced by Abu Dhabi, which are promised to offer 'more performance per dollar' and 'more virtual machines per watt' to customers. Valencia is replaced by Seoul, and Zurich is replaced by Delhi. All new server processors are due in the second half of 2012, and will use the Piledriver revision of Bulldozer, with Abu Dhabi being another multi-chip module akin to Interlagos, based on the Seoul design. Nothing was mentioned about Vishera, the desktop equivalent of Seoul. A nice little sub-presentation was from Andy Feldman, acquired with SeaMicro by AMD, and he talked about making more efficient, more powerful servers as well as showing off the first AMD Opteron SeaMicro server, just 9 weeks after they were acquired by AMD.

AMD had a little bit of fun - Mark showed off what he told us was the AMD FirePro W9000. The slides behind him looked very different from the card in his hand, and the specifications were different from any product released at the time (check the video); turns out there were two teases going on here. The W9000 is the FirePro version of the AMD Radeon HD 7970 GHz Edition, and the 4.3TFLOPS SP and 1.03TFLOPS DP compute performance was pulled from that card. The card in his hand however, had a triple fan cooler and dual-8pin power inputs and was 'a dual-GPU product we haven't released yet' according to AMD PR. This is New Zealand, the long-awaited dual-Tahiti ultra-enthusiast graphics board. With a nominal maximum board power of 375W it doesn't seem like dual Tahiti XT2s can be fitted to the thing, but AMD neatly sidestepped that particular limit with the HD 6990 and used an OC mode BIOS with a 450W cap to plough the NVIDIA GTX 590 under. Will AMD do the same to the GTX 690 with the New Zealand card?

AFDS 2012 Summary

The 2012 AMD Fusion Developer Summit was a better event than last years, with great content and scheduling. The online AFDS-D and schedule builder tools are top-notch, although an iOS & Android app walking map of the conference rooms etc. would have been a welcome touch. The provided breakfasts and lunches were good if not great and there was bacon at least once. Several presentations overflowed their room capacity, and the demonstrations of some technology were sometimes hard to get to due to scheduling conflicts – so much great content. After the fact, the presentations are online now for your enjoyment, too. Hopefully next year AMD can bust the magic 1000 attendee's mark (they hit around 800 this year) and have a single facility with room for all the content, keynotes and demonstrations they want to host. This may mean moving away from Microsoft and Bellevue, but maybe that's an opportunity to pick a locale with slightly warmer climes for better late night activities. Maybe the schedule could be tweaked to keep all the keynotes and sessions on the same days, so all the content gets full attention; having shorter morning, early afternoon, and evening keynotes might facilitate this.