Log in | Imprint
last updated 3/28/2016 | 386555 views
On Twitter
On Vimeo
On YouTube

Main . Blog

3/28/2016: The Future is Now: Compiling C++17 to the GPU (Vulkan/OpenCL/SPIR-V)

Update 9/21/2019:

The newest clang release officially brings experimental C++ support for OpenCL, superseding the experimental modifications discussed in this blog post! :)

Original 2016 experiment:

The release of SPIRV-LLVM and all the talk about Vulkan at GDC made me curious as to how far we actually are from compiling other languages, particularly full-featured C++, to the GPU. So far, using somewhat current versions of C++ on the GPU has been a privilege reserved to CUDA programmers specifically targeting NVIDIA GPUs. TLDR: As it turns out, it is already possible today. See command lines in the center, experimental compiler binaries at the bottom.

Sources
Interestingly, clang implements OpenCL with very few additions to the C99 front end, which in turn shares most of its logic with the C++ front end. As it turns out, a miniscule set of changes allows the use of any C++ standard in combination with the OpenCL extensions. Merging in the OpenCL headers and SPIR-V code emission from Khronos' OpenCL C Compiler, we can in fact already obtain an end-to-end solution compiling C++17 to SPIR-V. New commits can easily be merged in as clang continues to evolve. You can find my experimentally patched version of the current clang master branch here: https://github.com/tszirr/clang

Compilation of this patched version of clang will also require an up-to-date port of LLVM-SPIRV to the LLVM master branch, which I did here: https://github.com/tszirr/SPIRV-LLVM

Usage
Now, we can simply compile C++ kernels to SPIR-V bit code using the following command line:

clang -cc1 -std=c++1y -triple spir-unknown-unknown -emit-spirv -x cl -o mykernels.cpp.sv mykernels.cpp

We can investigate the generated SPIR-V bit code using the LLVM-SPIRV tool:

llvm-spirv -to-text -o mykernels.cpp.sv.txt mykernels.cpp.sv

Disclaimer
I successfully compiled a few templates, classes with constructors and destructors, and group-shared/global/generic/image memory accesses. Note that so far, however, this combination of languages is officially unsupported and will only partly follow the OpenCL C language specification, particularly not the provisional OpenCL 2.1 C++ specification. I have been in contact with the clang/LLVM codebase for merely a day, therefore I am certainly no clang/LLVM expert and cannot assess the full extent of code paths where C++ and C99 front ends diverge. I fixed one minor issue for C++ declarations involving memory spaces, but more might turn up elsewhere.

Binaries
Since building clang and LLVM takes quite some time, I compiled a minimal set of experimental x64 Windows binaries that can get you started: http://alphanew.net/releases/experimental/clang-oclxx-spirv.zip

Backwards Compatibility
(Note on legacy OpenCL/SPIR compatibility: For OpenCL implementations only supporting the old LLVM 3.x-derived SPIR format, you can use SPIR-V as a stable intermediate format that can then be fed to the official LLVM 3.x-based SPIRV-LLVM tool for reverse conversion from SPIR-V to SPIR. Be aware that you will have to obtain that official binary elsewhere, since the llvm-spirv binary provided in the above zip file is based on the current LLVM master branch and therefore unsuitable for that purpose, it merely serves for debugging/disassembling purposes.)

4/1/2015: Recent Projects and Publications

I joined the KIT computer graphics group last december. More recent projects that have already been published can be found on cg.ivd.kit.edu/english/zirr. More to follow.

1/23/2014: GPU Pro 5: Object-order Ray Tracing for Fully Dynamic Scenes

GPU Pro 5, the next addition to the GPU Pro book series, will feature an article that presents some of my work towards more game-friendly ray tracing. I just added a teaser blog entry to the official GPU Pro Blog, check it out here. The full article will be released as part of the book on March 25.

6/26/2011: Torsion footage & Twitter

First of all, I have only very recently joined the Twitter network, feel free to become a follower, if you're interest in my daily findings and annoyances regarding real-time graphics and other development.

In other news, I have finally taken time to capture some footage of a gameplay prototype that I was working on for several weeks about one year ago. The player is given the ability to warp space by placing two warp anchors anywhere in the world, warping the space in-between those anchors. Space warping also affects physics, which may be used to make objects roll off curved surfaces, bridge gaps etc. Like the previous demos, this prototype was built on top of my engine/framework/sandbox called breeze.

Best viewed in HD, click the title and switch to full screen on Vimeo.

Head over to the Torsion project page for more information.

1/4/2011: Imagine ranked Best 64k Intro 2010 by 4sceners

Sorry for the long silence, it seems my studies are once again eating up way too much of my precious time. In fact, I have not been all that idle in regard to my projects as it may seem, some of you might even have followed some of my more recent work spread throughout several German game development boards. I will update this site once I can find the time for it.

For now I am happy to announce that we're once again featured in 4sceners' list of the best demo scene productions released throughout the last year, this time ranked 1st in the category of 64k Intros.

It has been about this time last year that we actively started working on this production and it's always nice to see that the effort put into it has made at least some kind of longer-lasting impression on other people who know what lies behind such work. The collaboration definitely was great fun, great thanks go out to Christian Rösch (Code), Mark (Code) and Turri (Music).

8/29/2010: Rendering Ice in Liquidiced

So I published yet another short article, covering the ideas behind the rendering of ice in our brand-new demo "liquidiced". Hope it helps. Enjoy.

8/29/2010: Liquidiced - 64k Intro released at Evoke 2010

So another 4 months of quiet development have passed, and we are very proud to announce the release of yet another demo collaboration: "Liquidiced", more high-end real-time computer graphics in less than 64 kilobytes, released at this year's Evoke Demo Party in Köln and ranked 3rd in the 64k Intro Compo.

Best viewed in HD, click the title and switch to full screen on Vimeo.

Download: Liquidiced - 64k PC Intro, Executable
Download: Liquidiced - Unpacked PC Intro, Executable (for those experiencing malware warnings using the packed one)

Again, this was a collaboration, so more credits go to Christian Rösch (Code, GFX), Mark (Code), xTr1m (Code, Music), rip (GFX) and LPChip (Music).

8/26/2010: Marching Cubes Optimization

I just published another short article, this time covering two simple ways of optimizing a naive implementation of marching cubes mesh generation, eliminating redundant density function evaluations and merging the lose triangles output by the marching cubes algorithm into one unified indexed mesh. Check it out.

8/17/2010: Fun statically linking against the C runtime library

This week, I was requested a custom build of breezEngine to be statically linked against the C runtime library. Statically linking against the CRT is not a nice thing to do, which probably is why dynamically linking against the CRT has been set as default since MSVC 7.0.

The major trouble that comes with a static CRT is due to the fact that any self-contained module of your program (e.g. an EXE or DLL file) will instantiate its own private copy of the CRT, including its own private heap. The moment modules start sharing objects allocated on one of these heaps across module boundaries, it becomes essential that any shared object be de-allocated on the very heap that it was originally allocated on.

Things get even worse when it comes to sharing more complex objects (e.g. STL strings or containers), as these objects may very well appear to have been stack-allocated, however, memory allocation on some heap is inevitably going to occur inside any of these objects, and might do so any time you invoke a method for one of these objects.

Fortunately, I have been aware of these issues from the start, still, circumvention of the pitfalls described can be tedious from time to time, which is why I am going to share a few tricks on that matter in this entry.

The first thing to notice is that most of the problems arise from header-only / header-centric libraries, as header code is always instantiated inside the module that is actually using the library, therefore inheriting both the module's private CRT implementation and heap. Some code might even get inlined, blurring the lines between your own module and the foreign library. This is why cross-module STL container sharing performs that badly in a static CRT environment, almost inevitably leading to crashes, whenever memory that has been internally allocated by the container code instantiated in one module is freeed in another module (e.g. to make way for more memory to be allocated, see vector reallocation).

In consequence, many sources suggest that you never ever share complex types across module boundaries altogether, which would indeed be the most simple and elegant way to overcome these issues without further caution and thinking. However, this simple rule puts harsh limits to the functionality and simplicity of your library, as there no longer is a way of sharing element collections, strings and so on without the extensive use of pre-allocated buffers.

Another possible solution would be to write your own non-header-based container library, as code that is compiled into your library and subsequently only accessed via import/export linkage will always use the same heap, being the one created for the module it was compiled to. This is not really feasible for larger container libraries, yet, the observations made here are still relevant:

The third possibility to fix issues concerning the STL is to make use of allocators. STL allocators provide a not-too-complicated way of providing your own source of memory to most of the STL objects, thus it is possible to acquire (and release) memory using an internal function that has been firmly compiled and linked into one of your modules, always accessing the same private heap, regardless of the module in which the container is actually used.

Having overcome STL container problems, there are more issues to be solved, such as creation and deletion of custom-type objects scattered across several modules. One simple and elegant solution would be to make use of the abstract factory pattern, once again firmly compiling and linking both creation and deletion into one module, subsequently making calls to these functions instead of constructing or destroying objects manually. This is also the pattern that COM follows.

Another solution involves LOCALLY overloading operator new and operator delete inside often-shared classes. Overloading these will ensure that calling new and delete for the corresponding classes will acquire and release memory using the specified overloads instead of the global operators, in this case the overloads have to follow the same rules as the custom STL allocators discussed earlier. It is also possible to put these overloads into a common base class, making sharing as easy as deriving from this class.

One final remark to be made concerns virtual methods. As, at compile time, the compiler has no way of knowing which method is going to be called in the end, a virtual method called at run-time has to lead to a method instantiated in the module that the corresponding object has originally been created in, thus, code inside virtual methods will always use the same private heap that was used on construction of the object in question.

Curiously enough, this is the reason why it is possible to use STL exceptions in a static CRT DLL environment, although these exceptions are commonly implemented using a standard STL string without any additional allocation care. Due to the fact that the standard exception classes' destructors are virtual, the message string is always freed on the same heap as it has been constructed on, regardless of module boundaries crossed by exception handling.

Final thought: Avoid statically linking against the CRT when you can afford it, as the effort, complexity and implications of supporting it are high. Also, if you do, be prepared to build your own binaries of the libraries that your project depends on, as static CRT binaries have been dropped from the binary packages of many libraries (however, building your own binaries may also be necessary for many other reasons, e.g. when tuning the performance of your project via the _SECURE_SCL define).

Further reading:
[KK's Blog] Dynamically linking with MSVCRT.DLL using Visual C++ 2005 ("History of the CRT" + Loads of most valuable CRT information)
[Stackoveflow] What's the differences between VirtualAlloc and HeapAlloc?
[MSDN] Heap: Pleasures and Pains
[MSDN] Comparing Memory Allocation Methods

8/13/2010: How can it be?

How can it be that well-educated young men, having grown up in an edenic democracy; how can it be that prosperous human beings, having spent their lives in close-to-perfect safety and in the certainty that their physical integrity be untouched for as long as their lives may last; how can it be that bright and trained minds as such might doubt the justness of these very conditions they themselves owe their most effortless existence to for some who have come to fall under a cloud in favor of the apparently innocent? A society is only ever as just as its people. Once rights become alienable, they will be alienated.

4/7/2010: Three not-so-cute anti-aliasing tricks

I just published an extensive article on how to fix various multi-sampling issues that are commonly related to inferred rendering techniques and post-processing effects in a DirectX-9-level feature environment. Read it here.

4/4/2010: Imagine - 48k Intro released at Breakpoint 2010, ranked 1st!

Finally, after more than 4 months of quiet development, we are proud to announce the release of our next demo collaboration: "Imagine", high-end real-time computer graphics in less than 64 kilobytes (actually it's 48k), exclusively released at this year's Breakpoint Demo Party in Bingen and ranked 1st in the 64k Intro Compo.

Best viewed in HD, click the title and switch to full screen on Vimeo.

Download: Imagine - 48k PC Intro, Executable
Download: Imagine - Unpacked PC Intro, Executable (for those experiencing malware warnings using the packed one)

As this was a collaboration, huge credits go to Christian Rösch (Code), Mark (Code) and Turri (Music).

1/4/2010: RAR ranked Best Invitation Demo 2009 by 4sceners

Nice surprise - released at the Evoke demo party 2009 in Köln and ranked 4th of 8 in the PC Demo competition, the success of the first breezEngine technology-based project "RAR - Devmania Invitation 2009" is not yet over:

For more information and video footage, see the second post below this one.

10/9/2009: Devmania Proceedings & Demos

Devmania 2009 is over, two days of most interesting presentations and conversation, and my deepest gratitude goes to the Devmania orginization team, having saved the annual convention from brusque interruption or even worse for this year. Unfortunately, I did not take any photos, but maybe you can get a glimpse of what was going on back there by following the banner link below and taking a look at the photos uploaded on their official website:

So now it's time to catch up with all the new stuff introduced in preparation to my big Devmania presentation of the breezEngine in its current state. To avoid writing the same things over and over again, I just uploaded the original presentation as well as an English translation for the non-German readers of this blog:

Download (German original): breezEngine Devmania 2009 [~2MB]
Download (English translation): breezEngine Devmania 2009 [~2MB]

Besides, I just released the first public tech demo incorporationg physics and minor game play, a mix of jump'n'run and puzzle elements. The demo was built in less than a week, more or less in parallel to the first prototype implementation of the new physics module, therefore it is neither balanced nor really finished, yet it already shows off some quite nice visuals and dynamics.


For those of you interested and matching the ridiculously high system requirements (Shader Model 3.0, lots of VRAM and fill rate, meaning quite a decent graphics card + latest DirectX Runtime, PhysX System Software, VC 2005 Runtime, for download and installation links, either click or see READ ME), there now is a public download available:

Download (for installation instructions, see READ ME): Bonny Nightmare [~1 MB]

8/2/2009: Great things...

...were about to happen, you remember? So here you go, finally available to all and everyone of you, the first breezEngine technology-based project ever that has officially been finished: "RAR", ranked 4th of 8 in the PC Demo competition, exclusively released at the Evoke demo party 2009 in Köln:

Best viewed in HD, click the title and switch to full screen on Vimeo.

Download real-time demo: RAR - Devmania Invitation 2009 [18 MB]

Watch ambience capture: RAR presented live at Evoke 2009 on YouTube

Note that this demo is the result of a very close cooperation with Christian Rösch, who came up with the idea of using the breezEngine to create PC Demos and who was the creative mind behind all but one scene in this particular demo - it should be pretty obvious which scene I am talking about. ;-)

Next Entries »

© 2024 Tobias Zirr. All rights reserved.