
GPGPU is one of those trendy buzz-words that gets thrown around a lot these days ... and for good reason. With the latest GPUs, the potential for accelerating a large array of algorithms (some would argue that there's some degree of parallelism hidden everywhere) is rather staggering. However, in order to tap into the parallel processing muscle of a GPU (and there's quite a bit of muscle there), proper tools are needed - and so we have arrived at today's topic.
ATI Stream Computing
Few may recall that a while back, think around the R520/580 period, ATI was the first to promote the GPU as more than a pixel pushing machine - speaking of things like Physics or general purpose processing on the GPU. Sadly, whilst sounding quite good, the outcome was less than spectacular: a Folding@Home GPU-accelerated client and a number of promises that didn't quite materialize during the cards' lifetimes. With the R6xx and subsequent parts, the ATI's GPGPU initiative seemed to gain some slight traction, but it remained something outside of the marketing forefront, being primarily dedicated to developers, with the end-users knowing little about the existence of the Stream SDK or Brook+, for example, whilst nVidia was trumpeting CUDA quite seriously, to such an extent that nowadays the perception is that nV is THE choice for GPGPU. Finally, ATI has decided to start promoting itself better in this field - enter the Stream SDK 1.3 update.
Stream SDK 1.3

The focus of this update is to bring ATI Stream Computing into the eye of the consumers, to give everyone who has a recent Radeon card (from R6xx onwards) the possibility to leverage it by including CAL (Compute Abstraction Layer) into the Catalyst driver suite (in the December release), to ease developers' jobs by improving the available toolset and to show the benefits of Stream computing through a free to use, consumer-interesting application: the AVIVO Video Transcoder. Let's go through these one at a time.

Fusion
As you may know, AMD has this “wacky” idea that the Future is Fusion. Instead of thinking about ways of supplanting the CPU with their GPUs, or claiming that the GPUs are dead(ish) and that the future is in CPUs, they envision a complementary relationship between the two: there are certain tasks that for which the CPU will be better suited, just like there are tasks where the GPU will rip. Fusion, as a corporate initiative, is about creating a coherent eco-system where each is assigned the job it's best at, with the tag-team providing the best results:

Moving Toward Open Standards
Whilst development tools for the CPU are rather ubiquitous (having years upon years of history does help there), tools for developing non-graphics applications on the GPU are still in their growth phase, as illustrated by the fact that an industry standard is yet to emerge, with both AMD and nVidia having proprietary solutions, at this point in time. AMD's viewpoint is that the industry should focus on open standards, that allow for developing one time for many target-platforms, instead of having to use a custom tool for each. As such, Brook+ (and similar proprietary solutions) is regarded as a tool for those that want to start developing now, but going forward AMD expects OpenCL and DX11 (and beyond) Compute Shaders to be the preferred tools for development:


Now, most of this hubris may not be all that important for the average guy who isn't that preoccupied with writing code. What he will undoubtedly care about is applications that can benefit from GPU acceleration. He may never download the Stream SDK itself, but he sure would appreciate it is his videos encoded faster or his perky presentations rendered more rapidly, for example. Luckily, AMD finally decided to also push the benefits that the typical users can experience, making it interesting not only for the developers, but also to the masses:


AVIVO Video Converter
The mainstream “proof-of-concept” will be the updated AVIVO Video Converter. Now, this application has been with us for quite a while, but up until now it was solely a very optimized CPU-based SD-video encoder. Its December release takes the plunge and leverages HD4xxx cards in order to do up to 17X faster encoding (see the attached presentation for details), whilst also adding support for HD formats. Since it's also free, it's probably going to get a lot of people interested in what a GPU can bring to the table, beyond rendering Crysis or FarCry 2.


Mainstream Applications
If one wants to be more adventurous about his video work (and also has a few dollars to spend), in 1Q 2009 both Cyberlink and ArcSoft will have more advanced software packages on the market, both of which will take advantage of ATI Stream.


Other than those, you can already experience the benefits of GPU acceleration in applications from Adobe (Acrobat Reader, Photoshop CS4, After Effects CS4 or Flash 10- all of these are IHV agnostic, using either OpenGL or DirectX to leverage GPUs) or Microsoft (Vista, Expression Encoder, PowerPoint 2007, Silverlight - once again, all are IHV agnostic, using DirectX and not Brook+ or CUDA, and if you didn't know about PowerPoint 2007 being GPU accelerated don't worry, few do, the setting is quite well hidden for some reason). Once again, we'll direct you to the attached presentation for more details.
Moving on to the developer side of things, the 1.3 update should bring a complete rewrite of Brook+ (see our Q&A with ATI's Stream Team at the end of this article) - this is something we're awaiting with bathed breath, as Brook+ hasn't provided the most enjoyable experience possible up until now, and there was a certain delta between it and the competition in terms of accessibility. Also, performance should go up, and support will be extended to more parts: the new Firestream 9270, as well as the HD4600/4550/4350. Speaking of the new Firestream card, it's a rather impressive beast, aimed primarily at the HPC market (High Performance Computing):


In closing, we'll urge you to read the whole presentation AMD provided, ATI's Stream Computing Presentation (TEMPORARILY REMOVED), since it goes into further details and provides performance numbers, which are always interesting. Also, please go through our Q&A with ATI's Stream Team on the following page, for further details about some of the included topics.
The message AMD tries to convey, as we see it, would be this: we're here, we're focused on Stream computing, we know we were somewhat under the radar in this area but we're changing that, we're looking at adding value for our customers, as well as looking at ways to help make developers' lives easier. We're giving interested parties the proprietary tools they need to start developing now, but in the future we're completely behind industry-wide standards, as opposed to proprietary solutions. If you've ignored ATI's Stream up until now, we're giving you reasons not to do so any more.
AMD have shared that they should be providing us with an early build of the 8.12 Catalysts that will implement all of the aforementioned goodies in the coming weeks, so we'll get a chance to see for ourselves how all of the aforementioned aspects materialize. We're quite interested in seeing how Brook+ will evolve, since there's still quite some time left until OpenCL and DX11 will hit the market, and until then if one wants to do GPGPU development on AMD GPU's that's what they'll use, as well as comparing the new GPU accelerated AVIVO with an optimized, multi-threaded CPU only encoder, both with regards to its speed as well as the quality that can be achieved.
Lets move on to the Q&A session, shall we?
Rage3D: Going by what Dave Nalasco said, the updated, GPU-accelerated AVIVO Video Converter will only work on HD4xxx cards upon introduction. Do you plan to expand support to HD3xxx/2xxx cards, at some point? Will you continue to optimize it going forward (the initial AVIVO seemed to stop in its tracks at some point)?
ATI Stream Team: At this point in time we are concentrating on the ATI Radeon HD 4000 series products. Transcoding heavily benefits from integer bitshift operations and the ATI Radeon HD 4000 Series architecture has 5x the integer bitshift performance of HD 2000 and HD 3000 products, so the performance enhancements under those architectures would be reduced.
We do have a software development roadmap for both the core AVT library (used by AVIVO Video Convertor and 3rd party applications) and the AVIVO Video Convertor UI and we do expect further improvements for both performance and usability going forward.
Rage3D: Why are you allowing Cyberlink to be faster than AVIVO for HD conversion?
ATI Stream Team: Cyberlink PowerDirector is actually using the AMD provided AVT library that AVIVO Video Encoder for Transcoding support on ATI Radeon products. This being the case the Transcoding performance of both PowerDirector and likely to be fairly similar when run on exactly the same input content and output targets. Power Director does, however, have other benefits such as supporting more file containers, multi-stream transcoding capabilities, all packaged in a full video-editing environment.
Rage3D: What are the specific enhancements to ArcSoft's SW that will show up going forward? (reference slide 16)
ATI Stream Team: ArcSoft will be shipping a near-HD upscaling developed with ATI Stream tools and will be using ATI Stream runtime included in Catalyst starting 8.12 release; we keep on working closely with ArcSoft on other features, but we cannot disclose their roadmap.
Rage3D: Photoshop CS4- for the time being everything is OpenGL, correct? Do you intend to take the same route nV took with their proprietary CUDA accelerated plug-in, and have your own Stream-accelerated plug-in(s)?
ATI Stream Team: With regards to Adobe CS4, Premiere has plug-in capabilities and this is the only place that CUDA is used with a plug-in for GPU accelerated Transcoding. The plug-in that applies in NVIDIA’s case is Elementals “RapiHD” solution and only operates on NVIDIA’s workstation products.
We are hoping to have more to discuss with regards to Adobe soon.
Rage3D: All acceleration features mentioned are achieved via DirectX? (reference Slide 18)
ATI Stream Team: Yes , all of the listed Microsoft apps are currently using DirectX for GPU acceleration.
Although the ATI Stream SDK pertains specifically to programming access outside of the traditional rendering API’s we do consider that any application that makes use of the GPU outside of pure rendering, be that to aid productivity in some method or to provide acceleration beyond 3D rendering, to fall under the “ATI Stream” banner.
Rage3D: Can you provide further details about the MS apps that are under development, and mentioned in the presentation? (reference Slide 12)
ATI Stream Team: We can’t talk about Microsoft applications under development.
Rage3D: Free, easy to use development tools is somewhat vague: CAL/IL is less user-friendly than the alternatives, since it's somewhat closer to ASM than a high-level language, and Brook+ still needs some growth. What avenue are you pursuing?
ATI Stream Team: Brook+ effectively provides a highly level access to our GPU’s, similar to CUDA, and is valid for many requirements at this point in time. Going forward we are obviously heavily involved in open/collaborative solutions, such as DirectX 10/11 Compute Shaders and OpenCL. Additionally we are looking to provide an easy transition path from Brook+ to OpenCL, which we will disclose more details on at a future date.
We are making great strides in improving both the runtime and compiler for Brook+. In v1.3 of the ATI Stream SDK, the Brook+ runtime has gotten a complete rewrite from the ground up. This is resulted in a more consistent user experience and higher performance Brook+ environment. In the next release, we are continuing on this effort to introduce access to more GPU features as well as continue to improve the developer experience. Moving forward to ATI Stream 2.0, we are fully embracing and behind OpenCL and actively working to provide our customers access to an OpenCL programming environment on our products. Depending on customer demand, we are exploring various avenues for current Brook+ users when we introduce ATI Stream 2.0.
Rage3D: Can details about the Brook+ performance enhancements be provided?
ATI Stream Team: The bulk of the Brook+ performance enhancement will result from the complete bottom-to-top rewrite of the Brook+ runtime. In addition to simply a better runtime architecture, other aspects of the Brook+ runtime rewrite that affect performance include smarter GPU memory management and an improvement in stream size and stream count limits. Finer scatter granularities and the addition of 2D stream scatter support help improve performance for certain Brook+ applications.
Rage3D: Will the Global Data Share be exposed to the programmer in this update?
ATI Stream Team: We think most people would be interested in the Local Data Share features of our RV7xx architecture. LDS was introduced in the RV7xx and is currently supported if you user the CAL API with IL. We are looking at ways to expose this functionality to the Brook+ programmer as well.
Rage3D: ETA for GPU accelerated math libraries, in the same vein with AMD's CPU accelerated Math libraries?
ATI Stream Team: ACML-GPU has been available for some time now. We will be releasing an update to ACML-GPU that will take advantage of our 7xx product family as well as provide support for multiple GPUs as well for scaling, when appropriate. We expect this to be available in the next few weeks.
Rage3D: How do IGPs fit into the whole Stream Computing puzzles? Can one hope to leverage his HD3300 via the Stream SDK?
ATI Stream Team: Yes, the ATI Stream SDK is applicable to our DX10 IGP cores, including ATI Radeon HD 3200 and 3300 in the 780G and 790GX chipsets, although they are not yet “officially” supported we are increasing support in a staged manner and IGP’s are set to be included.
We hope you've enjoyed this window into the future of Stream Computing! Thanks again to ATI's ATI Stream Team in bringing you this introduction to Stream Computing!