VL.PythonNET and AI worflows like StreamDiffusion in vvvv gamma

Until there is a dedicated website, here is a blog article I would have published for visibility:

StreamDiffusion Performance on RTX 5090: Real-Time AI with TensorRT & CUDA 12.8

The first performance tests of StreamDiffusion on the RTX 5090 show an impressive 30%+ performance boost compared to the RTX 4090.

In the fastest test case (img2img mode, 1-step, 512x512 resolution, TensorRT acceleration) the sd-turbo model achieves well over 100 FPS on the RTX 5090, making it one of the fastest real-time AI diffusion implementations available:

sdxl-turbo at 1024x1024: High-Resolution, Smooth Performance

The real advantage of this performance boost is the ability to run the sdxl-turbo model—which is significantly more expensive to run but produces higher-quality images—at much higher resolutions.

Thanks to CUDA 12.8 and TensorRT acceleration, the RTX 5090 achieves a smooth and interactive 23 FPS at 1024x1024 resolution, a 4x increase in pixel count compared to 512x512.

This makes this StreamDiffusion implementation in VL.PythonNET for vvvv gamma most likely the only implementation capable of running sdxl-turbo with TensorRT at 1024x1024 in real time on the RTX 5090.

🚀 Key Takeaways:
100+ FPS with sd-turbo (512x512, TensorRT, img2img, RTX 5090)
23 FPS with sdxl-turbo (1024x1024, TensorRT, img2img, RTX 5090)
30%+ speed improvement on new Nvidia 50 series
CUDA 12.8 & TensorRT 10.8 acceleration for next-gen real-time AI

Stay tuned for further optimizations and benchmarks!

18 Likes