VL.PythonNET and AI worflows like StreamDiffusion in vvvv gamma

nissidis · September 10, 2024, 11:37am

ah sure, yes, easy. Not meant to capture space for this purpose. I will remove it from here considering this is not directly related to the VL.PythonNET.

thanks <3

tonfilm · November 9, 2024, 3:27pm

Here are some recent updates:

VL.DepthAnything:

This new package utilizes a highly optimized TensorRT version of DepthAnythingV2 with direct GPU transfer, achieving interactive frame rates for real-time depth estimation from a live camera input.

VL.StreamDiffusion Updates:

New VAEEncode Node: This node encodes images into latent space, allowing smooth interpolation between two images for seamless transitions. Static image performance is also improved, as the latent result can be cached directly in the patch.
Seed Interpolation for Initial Noise: Now, the initial noise can be smoothly interpolated between two seeds, enabling “seed-traveling” in StreamDiffusion.

tonfilm · February 21, 2025, 12:48pm

Until there is a dedicated website, here is a blog article I would have published for visibility:

StreamDiffusion Performance on RTX 5090: Real-Time AI with TensorRT & CUDA 12.8

The first performance tests of StreamDiffusion on the RTX 5090 show an impressive 30%+ performance boost compared to the RTX 4090.

In the fastest test case (img2img mode, 1-step, 512x512 resolution, TensorRT acceleration) the sd-turbo model achieves well over 100 FPS on the RTX 5090, making it one of the fastest real-time AI diffusion implementations available:

sdxl-turbo at 1024x1024: High-Resolution, Smooth Performance

The real advantage of this performance boost is the ability to run the sdxl-turbo model—which is significantly more expensive to run but produces higher-quality images—at much higher resolutions.

Thanks to CUDA 12.8 and TensorRT acceleration, the RTX 5090 achieves a smooth and interactive 23 FPS at 1024x1024 resolution, a 4x increase in pixel count compared to 512x512.

This makes this StreamDiffusion implementation in VL.PythonNET for vvvv gamma most likely the only implementation capable of running sdxl-turbo with TensorRT at 1024x1024 in real time on the RTX 5090.

🚀 Key Takeaways:
✔ 100+ FPS with sd-turbo (512x512, TensorRT, img2img, RTX 5090)
✔ 23 FPS with sdxl-turbo (1024x1024, TensorRT, img2img, RTX 5090)
✔ 30%+ speed improvement on new Nvidia 50 series
✔ CUDA 12.8 & TensorRT 10.8 acceleration for next-gen real-time AI

Stay tuned for further optimizations and benchmarks!