VL.PythonNET and AI worflows like StreamDiffusion in vvvv gamma

ah sure, yes, easy. Not meant to capture space for this purpose. I will remove it from here considering this is not directly related to the VL.PythonNET.

thanks <3

Here are some recent updates:

VL.DepthAnything:

  • This new package utilizes a highly optimized TensorRT version of DepthAnythingV2 with direct GPU transfer, achieving interactive frame rates for real-time depth estimation from a live camera input.


VL.StreamDiffusion Updates:

  • New VAEEncode Node: This node encodes images into latent space, allowing smooth interpolation between two images for seamless transitions. Static image performance is also improved, as the latent result can be cached directly in the patch.

  • Seed Interpolation for Initial Noise: Now, the initial noise can be smoothly interpolated between two seeds, enabling “seed-traveling” in StreamDiffusion.

14 Likes

Until there is a dedicated website, here is a blog article I would have published for visibility:

StreamDiffusion Performance on RTX 5090: Real-Time AI with TensorRT & CUDA 12.8

The first performance tests of StreamDiffusion on the RTX 5090 show an impressive 30%+ performance boost compared to the RTX 4090.

In the fastest test case (img2img mode, 1-step, 512x512 resolution, TensorRT acceleration) the sd-turbo model achieves well over 100 FPS on the RTX 5090, making it one of the fastest real-time AI diffusion implementations available:

sdxl-turbo at 1024x1024: High-Resolution, Smooth Performance

The real advantage of this performance boost is the ability to run the sdxl-turbo model—which is significantly more expensive to run but produces higher-quality images—at much higher resolutions.

Thanks to CUDA 12.8 and TensorRT acceleration, the RTX 5090 achieves a smooth and interactive 23 FPS at 1024x1024 resolution, a 4x increase in pixel count compared to 512x512.

This makes this StreamDiffusion implementation in VL.PythonNET for vvvv gamma most likely the only implementation capable of running sdxl-turbo with TensorRT at 1024x1024 in real time on the RTX 5090.

🚀 Key Takeaways:
100+ FPS with sd-turbo (512x512, TensorRT, img2img, RTX 5090)
23 FPS with sdxl-turbo (1024x1024, TensorRT, img2img, RTX 5090)
30%+ speed improvement on new Nvidia 50 series
CUDA 12.8 & TensorRT 10.8 acceleration for next-gen real-time AI

Stay tuned for further optimizations and benchmarks!

21 Likes

Small update:

VL.StreamDiffusion now supports LoRA inputs, even multiple.

SDXL-Turbo supports TI-Adapter and ControlNet++ Union with exceptional quality:

Depth Anything 3 with TensorRT acceleration and dynamic resolution:

Depth Anything 3 small:

Depth Anything 3 base:

12 Likes

Any plans to update to https://streamdiffusionv2.github.io/ . thanks

@alexqbit waiting for @tonfilm to give a certain response, I have to say that I looked to this a couple of months ago and from what I understand reading the the paper this is barely running on a H100. However I would be really impressed if this is not the case :D

2 Likes

Currently not, I had a look at it, and while the consistency is better using video/world models, the output quality and resolution vs. FPS is less than interesting at the moment, especially for consumer GPUs. However, if anyone has budget and needs this for a particular reason, please get in contact.

There are more promising developments in other areas, stay tuned for more. ;)

3 Likes