Hello! I’m running into trouble with the FFT node of VL.Audio. For a window size N, it appears that rather than running every frame on the N latest samples it waits until it has received 2N new samples (Nyquist). This means that for a window size of 4096 with 48kHz sample rate, it only updates about 6 times per second (48k/(2*4096) ~= 5.85). This makes it difficult to use for visualizers, in particular for performing more advanced analysis and capturing bass notes correctly. For instance, say you run with a window size of 512 samples. This gives about 50Hz update speed, but on the other hand, the lowest frequency bin is at 90Hz, next at 180Hz, which gives a very low resolution in the most important registers.
Is there a workaround to this, or have I configured my audio drivers incorrectly? I haven’t dug in properly in the VL.Audio source code, so I’m not sure if there’s a quick fix for this. I’m otherwise considering using an external FFT library by getting the samples from the audio frames. This should hopefully not add much overhead or latency, since it’s all already on the CPU.
After poking around a bit, I see that the issue lies in the implementation of CircularBuffer. I see the FFT calculation step is only run whenever the buffer loops around. My question is then how to neatly sync this to frames rather than the audio pipline (which I assume is unlinked from the frame rendering pipeline). Would using a getter for the FFTOut field be a sensible way to run the CalcFFT function (while taking the N latest values from the circular buffer)? I’m going to set up the development environment to try it, but thought I’d in the meantime check if it’s in line with VVVV’s style of handling things.
Here we are lucky, the lowest band at 512 bins (1024 samples) is half of what you calculated, so about 45Hz and the next one is around 90Hz. So usually you just put 512 as bin size and have the best trade-off between time and frequency resolution. I’m usually using that and move on to the next problem.
What you are suggesting is a moving window that updates on each sample buffer of the audio card, potentially keeping samples that have been contributing to the previous FFT calculation. this is of course possible, but it would need more computing power. So you would get better resolution in the bass, but you are losing resolution in the time domain because you are averaging more time. so the FFT frames tend to be “less different” from each other, as more time is in it.
But it could definitely be an option for the FFT node so that every user can decide on what they want. So if you want to try, that would be great. just keep in mind, that you need to do the FFT on the samples in the right order so that the windowing function works correctly. The circular buffer might be somewhere in the middle and you have to copy the samples in two or 3 blocks over to the FFT input.
It’s a draft currently, as I’m looking at how exactly to add it to the library. I’m thinking adding it as a second node and leaving the old behaviour as-is would be a pragmatic non-breaking way to do it. Have a look if you’re interested.
I tried TouchDesigner for a bit to compare it, but I find that while it’s really fast to work with in some aspects, you can only put together the pre-built components in so many ways. For instance, pretty much everything I see coming out of TouchDesigner looks very nice but it’s all either point clouds, particles or displaced lines (I should not throw too much shade though, I’ve used it for, like, 10hrs). vvvv beats it handily in flexibility, you really can do anything with it.
HOWEVER, while I love the base vvvv editing experience, many libraries have a million tiny snags that make the whole system frustrating to use. I realised this issue when I tried TD’s frequency analysis, which instantly produced really nice results due to the very high FFT resolution (btw TD has a similar computation time for 8192 bins). Similarly the Video file player hitches when you change playback speed - which in TD just works™ (it’s what I’m going to look at next). As such I’m just trying to iron out all these little kinks to make the whole workflow smoother. (</rant> regarding my feelings of vvvv vs. TD)
Interesting, I am also just getting back to FFT stuff in vvvv. The frequency bins being distributed somewhat unfortunately (not many on the low end) is a general problem.
I am looking into implementing Mel scaling to get a more linear distribution of values (linear in the human hearing sense).
There is a LinToLog node, but it is not built very well since it converts the FFT Spreadbuilder into a Spread, which gets recreated every frame.
There is also the conversion from actual FFT values (representing power) to how we perceive them (loudness), but I have found a good formula for that using Log10.
I’ll post some of the progress once it’s ready and maybe it can make its way into the library.
@seltzdesign I experimented quite a lot lately with different tools and libs, now I am building a custom one on top of NWaves.
Since I am not an audio engineer I mostly trying out things and attempting to interpret papers I find on internet as well as on DSP Stack Exchange.
However, a good tradeoff I found, inspired by the way that most people are dealing with FFT especially for speech recognition was to practically decimate my signal in order to have better resolution in the frequencies I am most interested (low freqs basically).
Generally speaking, analyzing low frequencies is a common problem in real time and this is obviously because of the Fourier Transformation itself.
More or less you will have eiterh better precision in time domain or frequency domain but not both. That’s the reason why many people out there suggest STFT instead as it compromises these in an acceptable way.
Mel Freq can definitely represent the spectrum in a better and more understandable way, Log Compression is also a nice method to spread better data along the bands axis.
As soon as I have something better and more robust in my hands I ll get back, this thread is very interesting and I would like to keep it alive.
There’s now a multithreaded implementation on the PR (Threaded & render-synced FFT by grufkork · Pull Request #34 · vvvv/VL.Audio · GitHub) implementing STFT (as far as I understand). The multithreading opens up for other more expensive computations such as zero-padding or multi-sample rate analysis, which in part helps with the issues mentioned. The current commit works well, but there’s discussion ongoing in regards how exactly to implement it balancing CPU load and latency (the current implementation is about as CPU light as it gets, but with a render frame of added latency, ~17ms if at 60FPS).
All additions to make things “just work” I think are very important to get new users aboard, because if the things you expect to “just work” don’t work it’s easy to bounce off when you need to find workarounds for basic things. Adding performant Mel scaling is such a thing that makes the whole experience smoother. Looking forward to it!
The current script by @tonfilm I am evaluating takes around 10µs on my machine, which doesn’t even have a very recent CPU (i9-11900K). So yeah, negligible performance cost.
We are still evaluating one little thing about it, which might be off. Otherwise it’s great and remaps the FFT spectrum to a much more intuitive scale.
Looking forward to the other FFT improvements from the PR!