Fluid
simulation (DX11/DirectCompute)
Improved
version that has no more artifacts and is 70% faster.
The
simulation now has a more smoke like aspect, and the smoke bounces off the walls.
Instead of
visualizing the velocity an advected fire reaction coordinate is rendered.
The
simulation speedup is realized by vectorizing some compute kernels, especially
the Jacobi iterations.
Thread group
size for all kernels is now 256, either 16x4x4 or 16x16x1.
Rendering
was made faster by sampling a scalar volume instead of a float4 volume.
Space bar
switches to showing velocity size instead of advected smoke.
All
simulation is now via second order MacCormack including limiters.


Here
another volume simulator, this time simulating an incompressible fluid solving
the Navier- Stokes differential equations. The simulation runs in a 200x200x200
voxel box
The
calculations make use of a well known scheme of velocity advection, Jacobi pressure solving and making the
velocity divergence free by subtracting the gradient of the pressure.
This is the
so called Semi-Lagrangian scheme. A more
accurate solver makes use of the second order MacCormack technique. The
simulation makes use of the latter. However it makes the simulation unstable
and introduces artifacts. Limiting generated extremes can fix this,
unfortunately I was not able to get this working, so the simulation runs
without limiters, still the result is some visual interesting turbulent
behavior.
The
amplitude of the speed vectors are visualized. To make a 3d rendering, a simple
ray maximum projection is used. This shoots rays through the volume searching
the maximum speed along the ray. With a linear interpolation the speed is given
some color.
Controls:
- mouse left drags a source on a plane
through the center of the volume and parallel to the screen
- mouse right to rotate the volume
- mouse wheel to zoom in and out
- space bar to toggle between
MacCormack and Semi-Lagrangian simulation
______________________________________________________________________
26 Dec 2009 / Jan Vlietinck
3D waves
simulator (DX11)

Long time
ago (16 years) I wrote a 2D wave simulator, based on finite differencing the
Now in a
similar spirit I've written a wave simulator that does simulation of volumetric
3D waves. These could be sound or electromagnetic waves. The simulation grid
now is 400 x 400 x 400 or 64 million voxels large. The simulation equation
becomes slightly more complex but still fairly simple namely:
p(x, y, z)
= 2*p'(x, y ,z) - p''(x, y, z) +
c*(
p'(x-1,y,z) + p'(x+1,y,z) + p'(x,y+1,z) + p'(x,y-1,z) + p'(x,y,z+1) +
p'(x,y,z-1) - 6*p'(x,y,z))
With
p(x,y,z) the pressure (for sound waves) at time t, p' the pressure at time
t-dt, and p'' the pressure at time t - 2*dt.
There is a
scalar and vectorized implementation.
On a HD
5870 scalar simulation speed is about 119 frames per second.
The
vectorized version runs at 257 frames per second corresponding to 16.5
GigaVoxels/s.
Only one
slice of the simulation is visualized, moving the mouse vertically shows other
slices.
The S key
toggles between the scalar and vectorized mode..
By pressing
the space bar other wave source constellations can be viewed, namely two
sources apart, a 4 source swirl and 2 source dipole.
The slice
viewing direction can be cycled by pressing the 'Enter' key to one of the 3
main axes (only scalar mode)
______________________________________________________________________
23 Oct 2009 / Jan Vlietinck
DX11
DirectCompute Julia 4D
Download DX11 Julia 4D fractal generator V1.4

Here
another fractal generator, this time rendering 4 dimensional Quaternion Julia
fractals.
It
continuously morphs the shape and colors of the fractal.
The shader
code is a port of the original Cg
version written by Keenan Crane.
The program
tries to make use of a DX11 compute shader 5, if this is not possible like on
DX10 hardware a pixel shader 4 is used instead of a compute shader. This
enables the code to run on any DX11 or DX10 GPU.
The fractal
can be rotated with the mouse to see it in all it's 3D glory. Zoom with mouse
wheel.
Also the morphing
can be toggled with the space bar, for better inspection.
Fractal
detail can be increased and decreased with the +/- keys of the numeric pad.
Self
shadowing can be toggled with the S key.
With the P
key, it is possible to switch between pixel and compute shader (if available)
ALT + Enter
goes from windowed to full screen
______________________________________________________________________
11 Oct 2009 / Jan Vlietinck
DX11
DirectCompute Mandelbrot and Julia viewer
Download DX11 Mandelbrot and Julia viewer V1.8
Download DX11 feature level DX10 version

Here a
quite fast Mandelbrot and Julia viewer,
making use of DX11 and the DirectCompute API.
The
software detects if your GPU support doubles, if not it will run using only
floats.
The set is
calculated with up to 1024 iterations. Making use of the horsepower of DX11
GPUs enables real-time panning and zooming even at high resolution.
A scalar
one and a vectorized computation version is included.
Both
generate the same output. The vectorized version was made after suboptimal
performance on the ATI HD 5870 with scalar calculation. The vectorization is
done by calculating 2x2 pixels at once.
Compared to
the scalar version it runs twice faster on this GPU at over 1.9 TFLOP/s.
In doubles
mode, performance is less than 400 GFLOP/s
A GTX 480
is about half as fast in float mode and about quarter speed in double mode,
compared to a HD 5870.
Full source
code is included.
Remark that
no drawing code was needed. It is possible to directly write to the backbuffer
from the compute shader.
Key controls
Space
bar : Toggles between Mandelbrot
and Julia
M key : Toggles between 1024 and 2048
maximum iterations
A/Z
keys : Cycle colors
V key : Vector calculations (only used
for floats, not much effect for doubles)
S key : Scalar calculations
F key : Float calculations
D key : Double calculations
E key : Toggle between two different
double versions
The first
version is a straight conversion of the float version. However ATI currently is buggy and slow for
this version. The second version does less loop unrolling and works ok on ATI. The
first version is about 1/3 faster (at least currently on Nvidia)
Mouse
Move :
Pan
Move + SHIFT : In Julia mode, move base point around to
get a morphing fractal
Drag left
and right : Zoom in / out
Pressing
the space bar switches to Julia calculation. It takes the point in the center
of the Mandelbrot view as the Julia base point.
Holding the SHIFT key while moving the mouse moves the Julia base point,
this results in an animation with the Julia set changing shape.
Pressing the
space bar again switches back to Mandelbrot calculation. With a deeper zoomed
in Mandelbrot view the Julia set will more gradual change shape.
DX10
support
In order to
support DX10 GPUs the DX11 feature level DX10 version should be used.
It will try
to make use of compute shader 4 instead of compute shader 5.
This
requires an additional pass with a pixel shader to copy compute shader output
to the screen.
In case
there is no support for compute shaders, as currently is the case with Nvidia
on
In this
case only the scalar version of the algorithm can be used.
On a GTX280
computational throughput is 1/4 of that of a HD 5870. This is to be excepted as
the former has around 600 GFLOP/s, where the latter has over 4 times more.
______________________________________________________________________
7 Oct 2009 / Jan Vlietinck
Fast
software renderer

Here a rather
fast software renderer engine demo called FQuake.
It renders
some level of the original Quake game.
The special
thing about this renderer is that it is pure CPU without using GPU. This is a
port of an original renderer I wrote back in 1997, which ran on ARM processors,
based on a reverse engineered format of the PC game data. In contrast to those
original versions this version here does texture mapping in software with
bilinear interpolation instead of point sampling. I wrote this software to
learn about SSE and how fast it can be, I used the knowledge to write a
software version of my volume rendering engine at Agfa.
This demo
engine is highly optimized making use of multiple threads and SSE code.
Perspective,
bilinear texture mapping runs at 650 Mpix/s on a Quad Core 2, 3.2Ghz.
A 64-bit
version is also included running 15% faster at 750 Mpix/s.
For
comparison with GPU rendering a DX10 version is included. On a GTX280 GPU,
rendering is about six times faster compared to native FQuake CPU rendering. Also
a DX10 WARP software rendering version is included. This CPU rendering version
runs about 5 times slower compared to native FQuake CPU rendering.. Also a DX9
version is included enabling comparison with the SwiftShader software renderer,
which has rendering speed similar to WARP.
When
looking at CPU usage one can see that utilization is only between 80 and 90%.
The cause seems to be the copying of the rendered image from CPU memory to GPU
memory done by the graphics card drivers. After upgrading to a HD5870 graphics
card I noticed that rendering speed is now 800 Mpix/s, as this card seems to
have a more efficient way of copying than the GTX280.
The engine
makes use of an algorithm that ensures zero overdraw.
At
2560x1600 resolution the engine runs at between 120 and 160 frames per second,
only slightly depending on scene complexity, corresponding to between 500 and
650 Mpix/s texture and pixel fill rate.
The mapped
texture consists of two layers, a material and light map. Though normally you
would do this with multi texturing, the engine does it with single texturing.
To make this possible a LRU texture cache is maintained with on the fly
compositing material/light texture maps as needed.
To make
optimally of all CPU cores, the screen is split up according to the number of
cores. The splitting positions of the screen are continuously moved to adapt to
the scene complexity so that all cores are maximally loaded.
You can fly
through the scene, clicking the mouse left and right buttons for forward and backward
movement. Holding the middle button with left/right causes quad speed.
You can
also switch between bilinear and point sampling, by pressing the space bar.
The image
is displayed via DirectDraw. For some reason, on systems with dual screens the
rendering can be slow. To get normal rendering speed you may have to disable
one of the screens. For graphics cards with PCIe 1.x the rendering speed will
be limited to 500 Mpix/s by the 2GB/s graphics bus.
Enjoy,
Jan
______________________________________________________________________
20 Jul 2009 jvlietinck <at> gmail
<dot> com