Product: AMD ATI Radeon HD5870
Company: AMD
Authour: James 'caveman-jim' Prior
Editor: Charles 'Lupine' Oliver
Date: September 23rd, 2009
DirectCompute - DaHoff on the Job


DirectCompute is the new buzzword. This part of DirectX 11 allows developers to create whatever additional hardware accelerated resources they need - AI, Physics, whatever. Codemaster's D.I.R.T. 2 preview at the AMD VISION launch showed to great effect how DirectCompute physics can improve game realism and immersion. With realistic changes in surface handling, driving the Colin McRae rally car through water generates bow waves and leaves wakes, which affect the handling of the cars following differently - less water means less drag. So simple and lifelike, it increases the playability of the game on DirectX 11 hardware.

AMD has a secret weapon in the DirectX fight - DaHoff! David Hoff is Director of AMD's Advanced Technology Initiatives team, inside the office of the Chief Technology Officer - one Eric Demers. Dave also used to work somewhere else ... I wonder if you can guess where?

Let's hear a little from the man himself:

Dave Hoff:

"When I saw AMD's public commitment at about this time last year to OpenCL and DX11 compute shader (now called Direct Compute), and having seen AMD's success with the Radeon HD 4800 series, I really wanted to join [AMD].

Things are a bit different for me here at AMD, working for SirEric in the office of CTO. I'm actually encouraged to charge after these kinds of new initiatives (my role is director of advanced technology initiatives).

One of the first things I did was meet with Havok, introduce them to the amazing engineering team I have here and explain that we could implement some of their code in OpenCL thereby enabling them to achieve acceleration on not just ours, but also Nvidia's GPUs. So we ventured into a quick little project to gauge the technical feasibility as well as if it was a good climate and team dynamics for our organizations to collaborate.

While we learned the answer to both, I can only report on the technical feasibility since we demonstrated Havok Cloth at GDC in March running in OpenCL on our Radeon HD 4890. In terms of productization, we're waiting for our OpenCL tools to complete conformance acceptance (they've been submitted to Khronos) and will likely need to get through some solid beta usage and up to a production state before an OpenCL-based Havok solution would be ready.

Then it's really up to Havok if they want to bring this to market. I'd like to see them do this particularly with their cloth product since game developers can incorporate cloth late in their development cycle and our OpenCL implementation is generally transparent to the Havok API.

And while there were some amazing software developers to jump in early and use the initial proprietary GPGPU programming models provided by both graphics companies, the adoption rate is going to really take off now that there are these new standards. As you heard last week at our launch event from Cyberlink, for example, they will obviously now consolidate and only go forward with programming in one API (in their case it seems to make sense to use Direct Compute).

I can't imagine any commercial software company who has tried a GPGPU programming model previously from either graphics company to not switch to OpenCL or Direct Compute. It's very easy to move from CUDA to either of these.

As you heard me describe [at the AMD VISION event], in the meantime, we've been particularly excited about what Pixelux can do. Their physics effects are amazingly realistic compared to anyone else. And their tools are great.

Their commitment to integrating with the free, open source Bullet Physics engine and doing OpenCL acceleration fits great with our commitment to OpenCL work on Bullet. Both Bullet Physics and Pixelux's DMM engine are already available and used in games and films, so developers can start right now and pick up the GPU acceleration as we role that out.

On the other hand, as I think you've seen from the PhysX side of things, while they seem to talk about support for openness when they're backed into a corner, apparently in a recent driver update they've actually disabled PhysX running on their GPU if an ATI card is used for rendering in order to pressure users to use an all Nvidia configuration.

The contrast should be fairly stark here: we're intentionally enabling physics to run on all platforms - this is all about developer adoption. Of course we're confident enough in our ability to bring compelling new GPUs to market that we don't need to try to lock anyone in. As I mentioned last week, if the competition altered their drivers to not work with our Radeon HD 4800 series cards, I can't imagine them embracing our huge new leap with the HD 5800 series.

While it would be easy to convert PhysX from CUDA to OpenCL so it could run on our cards, and I've offered our assistance to do this, I can just imagine PhysX will remain that one app that doesn't ever become open.

As you may figure from my CUDA role, I was the guy responsible to get developer adoption. In addition to being a nut about SDK quality and following developers closely on the forums to initiate feature requests or critical fixes, I initiated the first ever consumer video transcode app with a partner using CUDA and delivered this to reviewers as part of the GTX280 launch, and I enabled developers to easily use notebook computers with CUDA-capable GPUs.

Of course, (in their brilliance - and why I left) folks over there abhorred this work I was doing to generate adoption since it didn't appear obvious enough that it would directly lead to Tesla sales ... (not only non-open, but even practically proprietary among their brands). At least they've eventually seen the light and seem to mention video transcoding now in about every breath ...

GPU [email protected] was also a project I initiated and ran at Nvidia I also started Nv's [email protected] team (team W.A....) initially for my test suite of machines. I'm still sometimes surprised at the enthusiasm around this.

The engineer I was able to borrow to do the CUDA implementation at Nv is amazing. He did an entirely different implementation than previous. This had some good new algorithmic tricks and was one of the best utilizations of the G80 architecture's shared memory. If anything, it would likely do even better if they had more than the 16KB shared memory size on Nv GPUs.

That's where it will get fun going forward. For DX11 direct compute support (specifically CS_5), all devices going forward will have double the g80 shared memory to 32KB. Also, Stanford finally has the new algorithm publicly available in a new molecular simulation package. So all ATI's new devices will basically be better at this since we added that shared memory for DX11. A good reason for anyone buying a new card to get a DX11 card.

Going forward. I'd expect the new algorithm to get ported over to OpenCL (which can take advantage of the 32KB local memories). I'd guess the porting will wait a little while longer until the OpenCL SDK's get a little more mature and optimized. We've just gotten our OpenCL implementation through official conformance verification.

So with the new HD5800 series and a decent optimizing OpenCL implementation, I expect some amazing PPD - new performance champs that will span our price line of GPUs.

I'm also excited to see how ultimately the OpenCL Folding implementation runs on CPUs. We've put a lot of work into our multi-core CPU implementation of the OpenCL compiler and run-time. As we get the OpenCL port of Folding done, as you mention, it will get Linux, Mac OS X and other OSes, but also other platforms that support OpenCL. Perhaps our CPU implementation will be an improvement as well."

Wow. Now there's a fellah who likes his job. At this point w'ddlike extend my sincere thanks to Dave Hoff, Eric Demers and Dave Baumann, plus the many others I corresponded with, for their help. You guys rock!corresponded with, for their help. You guys rock!

Overall, I liked the comment from Mike Gamble, Crytek Licensing Manager regarding DirectX 11:

Mike Gamble:

"Free for use; no resources needed to make it work; increased fidelity is a win/win"

With CryEngine 3 capable of running DirectX 9,10 or 11 modes, Codemaster's EGO engine fully overhauled to use DirectX 11, and Unreal Engine 4 designed for DirectX 11, there are going to be a lot of titles supporting it.

content not found

Copyright 2022 ©

You may not use content, graphics, or code elements from this page without express written consent from

All logos are trademarks of their original owners. Used with permission.