VL.Nvidia.CUDA

A package that makes it possible to run CUDA code on Nvidia cards in vvvv gamma.

RadixSort for GPU Buffers

Part of the Gaussian splatting implementation for vvvv gamma in VL.Fuse.
Comissioned by @motzi.

GPU BCn DDS Compression

This is probably the fastest 3d rendering to BCn DDS compression you can get. It directly transfers the GPU memory of a texture to CUDA, compresses it via Nvidia Texture Tools 3 (NVTT 3), downloads the compressed bytes to the CPU, and saves them to disk.

Available compression formats: BC1, BC3, BC4, BC5, BC6u, BC6s, BC7.
Commissioned by Refik Anadol Studio.

The help patch is still my development test patch. If someone has an idea for a better one, please post a patch here. It would be super helpful…

Based on CUDA 12.8, so don’t forget to update your GPU driver!

Let me know if the package runs as is (besides updating the drivers). I wasn’t sure what to include to make it portable without Nvidia installers.

12 Likes

That might come in handy when you need to sort particles with alpha by depth, thanks

1 Like

Data doesn’t seem to be sorted?
The index buffer isn’t changed. I’m running 572.70, is there a cuda download as well, or is it included?

I think you need to install CUDA sdk first.

1 Like

This should not be necessary, let me know what kind or error you see in the console. The 3GB CUDA toolkit install should only be needed by developers who write CUDA code and need to compile it.

I think I needed the cudart64_12.dll, but I have now changed it to statically link it and it should not have this dependency anymore, please try again with the new nuget >= 0.1.0-beta1.

1 Like

That works! Thanks, is there any way to work with fuse structured buffers? What is the TextureFXGraph node for?

Yes, you just need to make a float, int, or unsigned int buffer with the value you want to sort, aka the sort key.

Then use the ReorderBuffer node to reorder the original buffer, if you need that. Or use the former indices directly in the Fuse draw node.

@texone integrated that in the Gaussian Splatting patch. Have a look there for details.

Oh, that’s just a leftover from the package I used as template. Ignore.

1 Like

Hey, will the source be public at some point?

Didn’t realise he had updated the splatting patch, wonderful! I have quite a lot of splats to play with :)

1 Like

Added BCn DDS compression on the GPU. See original post.

2 Likes

Hi, nice addition, seems to be blazing fast, one issue noted:

Also few notes:
Success, seems to be 1 if you feed texture one by one, prolly would be better to have some sort of OnSuccess as bang or channel… Or not sure, put compressor in delegate?

Second problem is FileTexture seems to be dead slow when you read files a lot… Looks like it tries to cache them, is there any known more direct way to load images to texture in stride? (use skia instead)?

Yes, TextureReader can be used for direct disk to vram transfer, that should be used for DDS loading. It will decompress it on the GPU.

The actual compression happens in the draw operation, after update, this is to make sure it’s getting the latest content for the textures, after the rendering.

I’ll think about an observable out.

Thanks for the feedback.

New version 0.2.2 is up, which can be exported with standard vvvv settings and no more preview version, as it all seems to work fine.

2 Likes

Hi, I am just curious if an integration with cuFFT would be something feasible?

Yes, definitely possible. For what purpose or application would you use it? If the data is on the GPU, it isn’t so easy to get it back to the CPU…

1 Like

Thanks! I am just entertaining the idea for audio visualization purposes.
I would be happy to have the FFT data directly on GPU and share it with my shaders (assuming I could solve uploading multiple audio signals directly to GPU to be processed by cuFFT…)
Since I already run a setup with multiple parallel FFTs on CPU that I then feed into my shaders, it sounds to me like this route would ease a bit my CPU Bottleneck ? just thinkig out loud.

1 Like

I’m also interested to have a FFT calculated and used on GPU, that would solve some limitations on FFT processing for big spread counts, and multiple FFT instances from the same audio source.

Any plan to open this to the masses?

Thanks

1 Like