Until there is a dedicated website, here is a blog article I would have published for visibility:
StreamDiffusion Performance on RTX 5090: Real-Time AI with TensorRT & CUDA 12.8
The first performance tests of StreamDiffusion on the RTX 5090 show an impressive 30%+ performance boost compared to the RTX 4090.
In the fastest test case (img2img mode, 1-step, 512x512 resolution, TensorRT acceleration) the sd-turbo model achieves well over 100 FPS on the RTX 5090, making it one of the fastest real-time AI diffusion implementations available:
sdxl-turbo at 1024x1024: High-Resolution, Smooth Performance
The real advantage of this performance boost is the ability to run the sdxl-turbo model—which is significantly more expensive to run but produces higher-quality images—at much higher resolutions.
Thanks to CUDA 12.8 and TensorRT acceleration, the RTX 5090 achieves a smooth and interactive 23 FPS at 1024x1024 resolution, a 4x increase in pixel count compared to 512x512.
This makes this StreamDiffusion implementation in VL.PythonNET for vvvv gamma most likely the only implementation capable of running sdxl-turbo with TensorRT at 1024x1024 in real time on the RTX 5090.
🚀 Key Takeaways:
✔ 100+ FPS with sd-turbo (512x512, TensorRT, img2img, RTX 5090)
✔ 23 FPS with sdxl-turbo (1024x1024, TensorRT, img2img, RTX 5090)
✔ 30%+ speed improvement on new Nvidia 50 series
✔ CUDA 12.8 & TensorRT 10.8 acceleration for next-gen real-time AI
Stay tuned for further optimizations and benchmarks!