Alpha New

The Future is Now: Compiling C++17 to the GPU (Vulkan/OpenCL/SPIR-V)

2016-03-28

Update 9/21/2019:

The newest clang release officially brings experimental C++ support for OpenCL, superseding the experimental modifications discussed in this blog post! :)

Original 2016 experiment:

The release of SPIRV-LLVM and all the talk about Vulkan at GDC made me curious as to how far we actually are from compiling other languages, particularly full-featured C++, to the GPU. So far, using somewhat current versions of C++ on the GPU has been a privilege reserved to CUDA programmers specifically targeting NVIDIA GPUs. TLDR: As it turns out, it is already possible today. See command lines in the center, experimental compiler binaries at the bottom.

Sources
Interestingly, clang implements OpenCL with very few additions to the C99 front end, which in turn shares most of its logic with the C++ front end. As it turns out, a miniscule set of changes allows the use of any C++ standard in combination with the OpenCL extensions. Merging in the OpenCL headers and SPIR-V code emission from Khronos' OpenCL C Compiler, we can in fact already obtain an end-to-end solution compiling C++17 to SPIR-V. New commits can easily be merged in as clang continues to evolve. You can find my experimentally patched version of the current clang master branch here: https://github.com/tszirr/clang

Compilation of this patched version of clang will also require an up-to-date port of LLVM-SPIRV to the LLVM master branch, which I did here: https://github.com/tszirr/SPIRV-LLVM

Usage
Now, we can simply compile C++ kernels to SPIR-V bit code using the following command line:

clang -cc1 -std=c++1y -triple spir-unknown-unknown -emit-spirv -x cl -o mykernels.cpp.sv mykernels.cpp

We can investigate the generated SPIR-V bit code using the LLVM-SPIRV tool:

llvm-spirv -to-text -o mykernels.cpp.sv.txt mykernels.cpp.sv

Disclaimer
I successfully compiled a few templates, classes with constructors and destructors, and group-shared/global/generic/image memory accesses. Note that so far, however, this combination of languages is officially unsupported and will only partly follow the OpenCL C language specification, particularly not the provisional OpenCL 2.1 C++ specification. I have been in contact with the clang/LLVM codebase for merely a day, therefore I am certainly no clang/LLVM expert and cannot assess the full extent of code paths where C++ and C99 front ends diverge. I fixed one minor issue for C++ declarations involving memory spaces, but more might turn up elsewhere.

Binaries
Since building clang and LLVM takes quite some time, I compiled a minimal set of experimental x64 Windows binaries that can get you started: http://alphanew.net/releases/experimental/clang-oclxx-spirv.zip

Backwards Compatibility
(Note on legacy OpenCL/SPIR compatibility: For OpenCL implementations only supporting the old LLVM 3.x-derived SPIR format, you can use SPIR-V as a stable intermediate format that can then be fed to the official LLVM 3.x-based SPIRV-LLVM tool for reverse conversion from SPIR-V to SPIR. Be aware that you will have to obtain that official binary elsewhere, since the llvm-spirv binary provided in the above zip file is based on the current LLVM master branch and therefore unsuitable for that purpose, it merely serves for debugging/disassembling purposes.)

Recent Projects and Publications

2015-04-01

I joined the KIT computer graphics group last december. More recent projects that have already been published can be found on cg.ivd.kit.edu/english/zirr. More to follow.

GPU Pro 5: Object-order Ray Tracing for Fully Dynamic Scenes

2014-01-23

GPU Pro 5, the next addition to the GPU Pro book series, will feature an article that presents some of my work towards more game-friendly ray tracing. I just added a teaser blog entry to the official GPU Pro Blog, check it out here. The full article will be released as part of the book on March 25.

Torsion footage & Twitter

2011-06-26

First of all, I have only very recently joined the Twitter network, feel free to become a follower, if you're interest in my daily findings and annoyances regarding real-time graphics and other development.

In other news, I have finally taken time to capture some footage of a gameplay prototype that I was working on for several weeks about one year ago. The player is given the ability to warp space by placing two warp anchors anywhere in the world, warping the space in-between those anchors. Space warping also affects physics, which may be used to make objects roll off curved surfaces, bridge gaps etc. Like the previous demos, this prototype was built on top of my engine/framework/sandbox called breeze.

Best viewed in HD, click the title and switch to full screen on Vimeo.

Head over to the Torsion project page for more information.

Imagine ranked Best 64k Intro 2010 by 4sceners

2011-01-04

Sorry for the long silence, it seems my studies are once again eating up way too much of my precious time. In fact, I have not been all that idle in regard to my projects as it may seem, some of you might even have followed some of my more recent work spread throughout several German game development boards. I will update this site once I can find the time for it.

For now I am happy to announce that we're once again featured in 4sceners' list of the best demo scene productions released throughout the last year, this time ranked 1st in the category of 64k Intros.

It has been about this time last year that we actively started working on this production and it's always nice to see that the effort put into it has made at least some kind of longer-lasting impression on other people who know what lies behind such work. The collaboration definitely was great fun, great thanks go out to Christian Rösch (Code), Mark (Code) and Turri (Music).

Rendering Ice in Liquidiced

2010-08-29

So I published yet another short article, covering the ideas behind the rendering of ice in our brand-new demo "liquidiced". Hope it helps. Enjoy.

Liquidiced - 64k Intro released at Evoke 2010

2010-08-29

So another 4 months of quiet development have passed, and we are very proud to announce the release of yet another demo collaboration: "Liquidiced", more high-end real-time computer graphics in less than 64 kilobytes, released at this year's Evoke Demo Party in Köln and ranked 3rd in the 64k Intro Compo.

Best viewed in HD, click the title and switch to full screen on Vimeo.

Download: Liquidiced - 64k PC Intro, Executable
Download: Liquidiced - Unpacked PC Intro, Executable (for those experiencing malware warnings using the packed one)

Again, this was a collaboration, so more credits go to Christian Rösch (Code, GFX), Mark (Code), xTr1m (Code, Music), rip (GFX) and LPChip (Music).

Marching Cubes Optimization

2010-08-26

I just published another short article, this time covering two simple ways of optimizing a naive implementation of marching cubes mesh generation, eliminating redundant density function evaluations and merging the lose triangles output by the marching cubes algorithm into one unified indexed mesh. Check it out.

Fun statically linking against the C runtime library

2010-08-17

This week, I was requested a custom build of breezEngine to be statically linked against the C runtime library. Statically linking against the CRT is not a nice thing to do, which probably is why dynamically linking against the CRT has been set as default since MSVC 7.0.

The major trouble that comes with a static CRT is due to the fact that any self-contained module of your program (e.g. an EXE or DLL file) will instantiate its own private copy of the CRT, including its own private heap. The moment modules start sharing objects allocated on one of these heaps across module boundaries, it becomes essential that any shared object be de-allocated on the very heap that it was originally allocated on.

Things get even worse when it comes to sharing more complex objects (e.g. STL strings or containers), as these objects may very well appear to have been stack-allocated, however, memory allocation on some heap is inevitably going to occur inside any of these objects, and might do so any time you invoke a method for one of these objects.

Fortunately, I have been aware of these issues from the start, still, circumvention of the pitfalls described can be tedious from time to time, which is why I am going to share a few tricks on that matter in this entry.

The first thing to notice is that most of the problems arise from header-only / header-centric libraries, as header code is always instantiated inside the module that is actually using the library, therefore inheriting both the module's private CRT implementation and heap. Some code might even get inlined, blurring the lines between your own module and the foreign library. This is why cross-module STL container sharing performs that badly in a static CRT environment, almost inevitably leading to crashes, whenever memory that has been internally allocated by the container code instantiated in one module is freeed in another module (e.g. to make way for more memory to be allocated, see vector reallocation).

In consequence, many sources suggest that you never ever share complex types across module boundaries altogether, which would indeed be the most simple and elegant way to overcome these issues without further caution and thinking. However, this simple rule puts harsh limits to the functionality and simplicity of your library, as there no longer is a way of sharing element collections, strings and so on without the extensive use of pre-allocated buffers.

Another possible solution would be to write your own non-header-based container library, as code that is compiled into your library and subsequently only accessed via import/export linkage will always use the same heap, being the one created for the module it was compiled to. This is not really feasible for larger container libraries, yet, the observations made here are still relevant:

The third possibility to fix issues concerning the STL is to make use of allocators. STL allocators provide a not-too-complicated way of providing your own source of memory to most of the STL objects, thus it is possible to acquire (and release) memory using an internal function that has been firmly compiled and linked into one of your modules, always accessing the same private heap, regardless of the module in which the container is actually used.

Having overcome STL container problems, there are more issues to be solved, such as creation and deletion of custom-type objects scattered across several modules. One simple and elegant solution would be to make use of the abstract factory pattern, once again firmly compiling and linking both creation and deletion into one module, subsequently making calls to these functions instead of constructing or destroying objects manually. This is also the pattern that COM follows.

Another solution involves LOCALLY overloading operator new and operator delete inside often-shared classes. Overloading these will ensure that calling new and delete for the corresponding classes will acquire and release memory using the specified overloads instead of the global operators, in this case the overloads have to follow the same rules as the custom STL allocators discussed earlier. It is also possible to put these overloads into a common base class, making sharing as easy as deriving from this class.

One final remark to be made concerns virtual methods. As, at compile time, the compiler has no way of knowing which method is going to be called in the end, a virtual method called at run-time has to lead to a method instantiated in the module that the corresponding object has originally been created in, thus, code inside virtual methods will always use the same private heap that was used on construction of the object in question.

Curiously enough, this is the reason why it is possible to use STL exceptions in a static CRT DLL environment, although these exceptions are commonly implemented using a standard STL string without any additional allocation care. Due to the fact that the standard exception classes' destructors are virtual, the message string is always freed on the same heap as it has been constructed on, regardless of module boundaries crossed by exception handling.

Final thought: Avoid statically linking against the CRT when you can afford it, as the effort, complexity and implications of supporting it are high. Also, if you do, be prepared to build your own binaries of the libraries that your project depends on, as static CRT binaries have been dropped from the binary packages of many libraries (however, building your own binaries may also be necessary for many other reasons, e.g. when tuning the performance of your project via the _SECURE_SCL define).

Further reading:
[KK's Blog] Dynamically linking with MSVCRT.DLL using Visual C++ 2005 ("History of the CRT" + Loads of most valuable CRT information)
[Stackoveflow] What's the differences between VirtualAlloc and HeapAlloc?
[MSDN] Heap: Pleasures and Pains
[MSDN] Comparing Memory Allocation Methods

How can it be?

2010-08-13

How can it be that well-educated young men, having grown up in an edenic democracy; how can it be that prosperous human beings, having spent their lives in close-to-perfect safety and in the certainty that their physical integrity be untouched for as long as their lives may last; how can it be that bright and trained minds as such might doubt the justness of these very conditions they themselves owe their most effortless existence to for some who have come to fall under a cloud in favor of the apparently innocent? A society is only ever as just as its people. Once rights become alienable, they will be alienated.

Three not-so-cute anti-aliasing tricks

2010-04-07

I just published an extensive article on how to fix various multi-sampling issues that are commonly related to inferred rendering techniques and post-processing effects in a DirectX-9-level feature environment. Read it here.

Imagine - 48k Intro released at Breakpoint 2010, ranked 1st!

2010-04-04

Finally, after more than 4 months of quiet development, we are proud to announce the release of our next demo collaboration: "Imagine", high-end real-time computer graphics in less than 64 kilobytes (actually it's 48k), exclusively released at this year's Breakpoint Demo Party in Bingen and ranked 1st in the 64k Intro Compo.

Best viewed in HD, click the title and switch to full screen on Vimeo.

Download: Imagine - 48k PC Intro, Executable
Download: Imagine - Unpacked PC Intro, Executable (for those experiencing malware warnings using the packed one)

As this was a collaboration, huge credits go to Christian Rösch (Code), Mark (Code) and Turri (Music).

RAR ranked Best Invitation Demo 2009 by 4sceners

2010-01-04

Nice surprise - released at the Evoke demo party 2009 in Köln and ranked 4th of 8 in the PC Demo competition, the success of the first breezEngine technology-based project "RAR - Devmania Invitation 2009" is not yet over:

For more information and video footage, see the second post below this one.

Devmania Proceedings & Demos

2009-10-09

Devmania 2009 is over, two days of most interesting presentations and conversation, and my deepest gratitude goes to the Devmania orginization team, having saved the annual convention from brusque interruption or even worse for this year. Unfortunately, I did not take any photos, but maybe you can get a glimpse of what was going on back there by following the banner link below and taking a look at the photos uploaded on their official website:

So now it's time to catch up with all the new stuff introduced in preparation to my big Devmania presentation of the breezEngine in its current state. To avoid writing the same things over and over again, I just uploaded the original presentation as well as an English translation for the non-German readers of this blog:

Download (German original): breezEngine Devmania 2009 [~2MB]
Download (English translation): breezEngine Devmania 2009 [~2MB]

Besides, I just released the first public tech demo incorporationg physics and minor game play, a mix of jump'n'run and puzzle elements. The demo was built in less than a week, more or less in parallel to the first prototype implementation of the new physics module, therefore it is neither balanced nor really finished, yet it already shows off some quite nice visuals and dynamics.

For those of you interested and matching the ridiculously high system requirements (Shader Model 3.0, lots of VRAM and fill rate, meaning quite a decent graphics card + latest DirectX Runtime, PhysX System Software, VC 2005 Runtime, for download and installation links, either click or see READ ME), there now is a public download available:

Download (for installation instructions, see READ ME): Bonny Nightmare [~1 MB]

Great things...

2009-08-02

...were about to happen, you remember? So here you go, finally available to all and everyone of you, the first breezEngine technology-based project ever that has officially been finished: "RAR", ranked 4th of 8 in the PC Demo competition, exclusively released at the Evoke demo party 2009 in Köln:

Best viewed in HD, click the title and switch to full screen on Vimeo.

Download real-time demo: RAR - Devmania Invitation 2009 [18 MB]

Watch ambience capture: RAR presented live at Evoke 2009 on YouTube

Note that this demo is the result of a very close cooperation with Christian Rösch, who came up with the idea of using the breezEngine to create PC Demos and who was the creative mind behind all but one scene in this particular demo - it should be pretty obvious which scene I am talking about. ;-)

A shader-driven rendering pipeline

2009-07-23

Today's entry's focus will once again be on the rendering pipeline, following my short introduction to its basic functionality about a year ago.

As already stated in the title, the beezEngine rendering pipeline is fully shader-driven, allowing programmers to change the way the scene is rendered and processed simply by changing shader code. This enables programmers to introduce a great variety of entirely new shaders to the engine without any need for custom application-side integrational code.

Communication between engine and shaders is handled by the so-called effect binders. The engine offers a whole bunch of these classes, providing the data necessary to transform and render objects as well as to perform lighting, shadowing and processing. In addition, these effect binders handle the creation of temporary, permanent and persistent render targets, automatically setting, swapping and scaling them at the request of the given shader. Effect binders even allow for dynamic flow control when it comes to rendering multiple passes, repeating or skipping passes according to the context in which the shader is used.

This all sounds extremely general, so here you go with some example code:

// Enables additive Blending
pass AdditiveBlending < bool bSkip = true; >
{
    AlphaBlendEnable = true;
    BlendOp = ADD;
    SrcBlend = ONE;
    DestBlend = ONE;
}// Light pass
pass LightDP < string Type = "Main"; string LightTypes = "Directional,Point";
               bool bRepeat = true; string PostPassState = "AdditiveBlending"; >
{
    VertexShader = compile vs_2_0 RenderPhongVS();
    PixelShader  = compile ps_2_0 RenderPhongPS(GetLightingPermutation(LIGHT_DIRECTIONAL, LIGHT_POINT));
}

This snippet is taken from the engine's default phong shader which makes use of quite a lot of the features described in the text above: The first pass is only used as a state block, the annotation bSkip telling the effect binder being responsible for dynamic flow control to skip this pass during normal rendering. The second pass is one of the passes that actually perform lighting, each of these passes providing one permutation for all of the different (pre-sorted) possible light type combinations. The Type annotation specifies that this pass is to be applied during the main (shading) stage of the current frame, there are also a pre (depth and additional data) stage as well as several processing stages. The LightTypes annotation specifies order and type of the lights that may be applied to this pass, bRepeat stating that the pass may be repeated several times if there are more lights to follow fitting this permutation. Finally, the PostPassState annotation specifies the name of the state-block pass defined earlier, thus once a lighting pass has successfully been applied, render states are changed to enable additive blending of the passes to follow.

Here's another extract taken from the tonemapping effect:

// Average luminance
pass LogLuminance < string Destination0 = "LuminanceTexture";
                    float ScaleX = 0.25f; float ScaleY = 0.25f; >
{
    VertexShader = compile vs_3_0 Prototype::RenderScreenVS();
    PixelShader  = compile ps_3_0 RenderDownscaledLogLuminancePS(g_screenSampler, g_fScreenResolution);
}
pass AvgLuminance < string Destination0 = "LuminanceTexture";
                    float ScaleX = 0.25f; float ScaleY = 0.25f;
                    int ResolutionX = 4;
                    bool bRepeat = true; >
{
    VertexShader = compile vs_2_0 Prototype::RenderScreenVS();
    PixelShader  = compile ps_2_0 RenderDownscaledLuminancePS(g_luminanceSampler, g_fLuminanceResolution);
}
pass ExpLuminance < string Destination0 = "AdaptedLuminanceTexture"; int ResolutionX = 1; int ResolutionY = 1; >
{
    VertexShader = compile vs_2_0 Prototype::RenderScreenVS();
    PixelShader  = compile ps_2_0 RenderDownscaledLuminancePS(g_luminanceSampler, g_fLuminanceResolution, true);
}

This snippet shows off some of the post-processing features provided by the effect binders. The first pass specifies a custom temporary texture to render the average logarithmic luminance to, at the same time requesting the engine to scale the render target down by 1/4. The second pass further averages the logarithmic luminance rendered by the previous pass, repeating the down-scaling in steps of 1/4 until a x resolution of 4 is reached. Note that as this is a processing effect, render targets are automatically swapped unless explicitly specified otherwise. The third pass performs one more step of averaging, outputting the exponential value of the result into a different custom persistent render target of the resolution 1x1 (persistent, as the result needs to be blended with the luminance value of the previous frame to simulate eye adaption). Of course, the 0 in the Destination0 implies that it is possible to make use of multiple render targets at once.

The creation of new render targets to perform averaging, blurring and similar stuff on, is easy as pie:

// Luminance texture
Texture g_luminanceTexture : LuminanceTexture <
        string Type = "Temporary";
        string Format = "R32F";
    >;
// Adapted luminance texture
Texture g_adaptedLuminanceTexture : AdaptedLuminanceTexture <
    string Type = "Persistent";
    string Format = "R32F";
>;

In that way, the whole rendering pipeline may be customized simply by changing shader code:

// Depth texture
Texture g_sceneDepthTexture : SceneDepthTexture <
        string Type = "Permanent";
        string Format = "R32F";
        string DefaultIn = "Pre";
        string FinalIn = "Pre";
        bool bClear = true;
        bool bClearDepth = true;
        float4 ClearColor = 2.0e16f;
    >;// Scene texture
Texture g_sceneTexture : SceneTexture <
        string Type = "Permanent";
        string Format = "R16G16B16A16F"; // HDR
#ifndef SCREEN_PROCESSING
        string DefaultIn = "Main,Processing";
        string FinalIn = "Main,Processing";
#else
        string DefaultIn = "Main";
        string FinalIn = "Main";
#endif
        string DefaultSlot = 0; // Allows for MRT
        bool bClear = true;
        float4 ClearColor = float4(0.0f, 0.0f, 0.0f, 1.0f);
    >;

The DefaultIn and FinalIn annotations also explain why it is not always necessary to specify custom render targets: DefaultIn specifies the stages that the render target is to be used in whenever there is no explicit destination given. FinalIn specifies the stages during which the render target is to be promoted onto the screen or onto different objects (e.g. when rendering reflections), given that one of the stages specified is the last to be rendered.

One more important feature concerning the shader-drivenness is the possibility to define render queues inside the shader framework:

// Solid renderables
RenderQueue g_solidRenderQueue : SolidRenderQueue <
        unsigned int Layer = 0;
        bool bDefault = true;
    >;// Canvas renderables
RenderQueue g_canvasRenderQueue : CanvasRenderQueue <
        unsigned int Layer = 1;
        bool bPrePass = false;
        bool bDepthSort = true;
    >;
// Alpha renderables
RenderQueue g_alphaRenderQueue : AlphaRenderQueue <
        unsigned int Layer = 2;
        bool bPrePass = false;
        bool bDepthSort = true;
    >;

Render queues specify a layer number that influences the order in which the queues are rendered (similar to the css layer attribute, haha), as well as certain flags, such as switching off specific stages for a particular queue, or enabling depth sort for alpha-transparent objects. A shader may then specify a render queue inside its technique annotations:

technique SM_2_0 < string RenderQueue = "AlphaRenderQueue"; >

Shadows, HDRR, More Footage

2009-07-17

And another four months! Incredibly huge amounts of things have happened since my last update in february, and that is besides me finally having finished school. Lots of new features have arrived throughout the last months, including soft shadows, high dynamic range lighting, support for multi-sampled anti-aliasing (tough one, still fighting a few of the typical issues here) and a completely new shader library. In addition, lots of bug fixing, refactoring and testing has been done, making the engine more usable than ever.

I won't go into the details today, instead I will just try to please you with another video showcasing the already well-known Amsterdam TechDemo in a completely new light:

Stay tuned, great things are about to happen in very near future. (And by the way, thanks for all the nice feedback!) Wonder what Devmania is? Check it out here. And sorry for the shaky free-hand cam...

New Footage

2009-03-02

Wow, another four months without update. Here we go with some nice new and shiny images taken from my most recent testing application:

The demo was developed in exactly one week, inspired by my uncle's idea of putting up towers that are connected by ropes all over Amsterdam, transporting people anywhere solely by means of gravity, and thereby solving the city's traffic problems. Besides being a quite innovative concept, this idea proved to be a great opportunity to evaluate the engine's current capabilities and workflow. There's even a video online, showing some footage taken from this testing application:

SSAO, shadows, refactoring

2008-10-30

Another two months no update, high time to skim through the recent subversion commits (loads of commits) and cover some of the more interesting changes in this "blog":

Shortly after the previous entry, I finished my experimental implementation of Screen Space Ambient Occlusion. Blurring the occlusion buffer turned out to be quite harder than expected, as the random noise generated by the ambient occlusion algorithm turned into nasty cloggy artifacts the moment I tried applying some gaussian blur. After lots of experimenting, I ended up using a 12-sample poisson disc to blur the occlusion buffer, which resulted in an ok-ish image with some barely noticable smooth random patterns left. Finally, I combined the occlusion buffer with the original scene:

Besides doing lots of architectural reorganization and refactoring, I also started to implement shadow mapping. Up to now, directional light (sun) shadows are the only shadows available, as high-quality long-range shadows seemed the most demanding to me, clearing the way for other light types' shadows as well. The current implementation uses a technique known as PSSM (parallel-split shadow maps), orthographically rendering the scene to one of three different shadow splits, depending on the viewer's distance (visualized on the left screenshot). Filtering & softness still missing:

Yet, by far the largest number of commits throughout these last two months were dedicated to the API. These changes include both generalization (e.g. to make lights available in processing effects) and simplification (making overly nested methods more accessible). Lots of classes were renamed according to their final responsibilities, making the API much more understandable and intuitive. Although there are still some methods missing, the basic class hierarchy may now be considered close to final.

Rendering pipeline #1

2008-09-04

Today's entry will focus on the key features of the engine's now close-to-final rendering pipeline. Throughout the last months, the rendering pipeline has grown a lot - in fact, it has by far outgrown my original plans.

My original intention was to implement a simple class taking over the management of scene elements that would frequently be required throughout the whole rendering process of a scene. This concept included collections of lights, renderables and perspectives, a central interface providing all the information needed to render a typical 3D object. The concept worked out pretty well. Soon, I had my first lit objects on screen, which can still be seen on the breezEngine project page.

Following this rather basic functionality, I started implementing a post-processing framework. This processing pipeline basically consisted of a list of processing effects being applied to the fully drawn scene, one after the other. The processing pipeline also provided depth, scene and screen targets any effect could write to and read from. In addition, it implemented the swapping mechanism necessary to allow for chaining of several effects. Of course, I also ran into the mysterious pixel shifting issues that almost certainly occur whenever people start implementing post-processing for the first time. Fortunately, there is this great article by Simon Brown on the net, explaining all about these issues.

Next thing, I introduced intermediate render targets into the processing pipeline, enabling shaders to define additional textures of arbitrary dimensions to write their temporary output to, allowing for blurring and downsampling without any need for additional engine code. The result of these efforts can also be seen on the breezeEngine project page as well as in the second entry below this one.

Afterwards, I realized that the concept of intermediate texture targets had even more potential than the actual implementation made use of. The basic idea was to generalize the possibility of defining additional render targets for all effects, moreover introducing the possibility to share these intermediate target textures among all effects. This led to the distinction between "temporary" and "permanent" render targets, the former only existing throughout the execution of the corresponding shader code, the latter existing throughout the rendering process of the whole scene. With this functionality implemented, it is not only possible to add pre-processing effects preparing scene-wide textures such as ambient occlusion, but it is also possible to change the whole process of rendering. For example, by introducing additional render targets, it is now possible to also render positions, normals and material IDs, allowing for the implementation of deferred shading only by changing shader code. In the end, I even removed all of the predefined render targets except for the depth buffers (and the back buffer, naturally), which led to a pretty neat design.

Lastly, the obligatory screen shots of my first attempt implementing Screen Space Ambient Occlusion:

I might also cover some of the theory behind this technique in another entry.