We are working on a project that involves a massive amount of computation for a big collection of point clouds and we need a solution for splitting that computation between as many cores as possible. We use 5 x dual 8 cores xeon procs for a total amount of 160 logical procs at 3Ghz each.
Now, I am aware of posts like this here Ultimate way of sharing data among instances - general - Forum as well as similar ones, and we tried different scenarios with moderate success. The most promissing seems to be the zeromq approach, but also this one has it’s drawbacks, one of them being the pain in the B of manually implementing that and the fact that there is no sync option.
I was reading this introduction in The Gray Book, I quote:
" You want to offload parts of your patch to separate threads
Large patches can become computationally expensive and vvvv does not allow you to use the full power of your PC by being inherently single-threaded. Using vl you can define regions of your program that you want to run asynchronously to the main patch, thus using multiple CPUs in parallel."
Well yeah,that’s preciselly what we need! The question is, how to do that?
The point clouds are algorithmically generated and their movement ( intricate algorithms as well) is creating long trails - and that is the computational problem I talked about. The data goes to a renderer.
I would like to split those point clouds into chunks and distribute that over as many threads as possible.
How important is it that the particles are moved via CPU? Is it a very complex algorithm?
I think with this large amount of points, no CPU would give you great frame rates. Also, if you do it on CPU you have to upload the whole data every frame to the GPU.
The ideal solution for this scenario is to upload data once to the GPU and then animate it with compute shaders… But that depends on what exactly you want to do with the points.
Sure, look, we are aware of this and we ported everything we could to the GPU (compute shaders, like you said, creation, animation etc happens in the GPUs)- which is a bunch of Titan XPs that eat that amount on bread and ask for a desert afterwards. However we reached a scenario where it is not possible to use the GPUs for two reasons: the calculation of the trails gets screwed up after a certain threshold - the thread group size seems to be the culprit there, but maybe I’m wrong. The end of the queue gets screwed up, the longer the trails the bigger the problem.
The second reason is that with different point clouds coming from different GPUs we are forced to do a readback of that data in order to send them over and render them in one place. But if you have a suggestion here, I am going to make your portrait in dough, vegetables or chewing gum!