I’m just about to give another pass on a challenge I encountered back in 2022: sending OSC to a lot (64) of remote clients over UDP.
(I am talking about individual messages specific to each client, so the new server/port range option added since then to OSCClient doesn’t help, since the latter is for sending the same messages to all destinations).
Initially, when naively having my OSCClients in a For Each loop slicing the list of IPs, I remember it would take about 1 minute on open to have the patch to become responsive; I was surprised that initializing connections would take so long.
I ended up using a modified version of OSCClient that allowed sharing a single common socket to all OSCClients; I can’t remember if someone shared that to me from the chat or guided me to this hack. As far as I remember, it was better but still surprisingly long to init.
I am not an expert in networking and socket-mumbo-jumbo, so:
is this single shared socket still the smartest approach?
were there any change to the default OSC nodes since 2022 that would make this hack irrelevant now?
any other idea or best practice on how to initialize communication with so many clients?
is it “normal” / typical that initialization would take so long in my initial For Each approach?
I don’t have access to all the devices for now so it is hard to make empirical tests in full conditions but I would like to prepare as much as I can in advance with the smartest and bestest approach.
For reference, this is the inside of the standard OSCClient:
If you want to send UDP to many machines, you can use the broadcast address. It is the current network IP with .255 at the end. For example 192.168.0.255.
So if all machines are in 192.160.0.xxx they all receive the packages on the specified port, no need to send it to each one with multiple packages and you don’t need to manage the receiver IPs.
There are of course details, that the switch needs to allow broadcasts. But usually, it just works.
It just becomes a bit more complicated if the clients need to send information back.
Using a single socket is totally fine and often faster than using multiple sockets.
Note that you only need to bind the socket to a specific port if you want to receive packets on that specific port. So clients could bind to a specific port to listen to the packages of the server.
Messages need to be specific to each client, so I cannot broadcast, otherwise this would have been the easy route indeed…
And they do need to send info back as well.
Using a single socket is totally fine and often faster than using multiple sockets.
Ok thanks for confirming this, so it means what I ended up using was definitely a better solution.
Does it make any sense though that initializing all OSCClients (I remember it was really bad using the default nodes, so each one having its own socket) was taking so much time and froze the patch for about a minute?
Is it the sockets opening that took so much time? (because since UDP is a not connected protocol, to me UDP itself doesn’t add any overhead)
Or is there something fishy to look for somewhere else?
I can’t remember if the single socket version was that much faster.
I will probably try and patch a test sending to 64 different ports locally, I guess it could approximate the real-life scenario?
Do you have 64 clients on the network or on a single machine?
If I understand you correctly, the problem is that you need to address messages to different clients. Think about it, maybe you did not think of something at the architecture stage? After all, if you use broadcast packets, then to address a particular client, you just need to add the necessary prefix to the OSC address. And filter the data on the client side. You might even like the idea of doing a simple route on the client: if it receives a prefix with an IP address, it checks if that address matches the machine address, and if it doesn’t, it accepts the address ‘as is’. This way backwards compatibility is preserved.
I don’t know, maybe it was OSC you needed, but there is NetMQ for such scenarios. But I’ll note separately that I didn’t really understand your basic architecture and about the boot lag.
would be great if you could demonstrate this in a simple exmple.
from your initial quest this wasn’t clear. can you elaborate a bit what you’re sending to the individual targets and what you’re expecting back and how you want to handle what you receive back.
So I finally arranged a little test patch to try and reproduce this strange long init time when trying to communicate with numerous OSC clients over UDP.
I don’t have with me the 64 devices I used to talk to, so I set up the patch to send to localhost on 265 different ports.
I tried the two methods mentioned earlier:
method 1: using a regular OSCClient in a ForEach loop for the 256 different destinations (each OSCClient opens and uses it own socket, so 265 total)
method 2: a modified OSCClient_UdpSocketInput, that allows sharing one single socket across the 265 senders.
Each method is enclosed in a ManageProcess to enable/disable each one separately and compare perfs.
You’ll notice that with either one, when you F9, it takes about 2 to 5 seconds (at least with my machine here, with i9 CPU) before the dummy LFO begins running.
Even if you disable both sending methods, I noticed that just having the 256 OSCServers ready for reception produced a significant (almost as long) lag before the patch become responsive.
In my past project, again it wasn’t local communication, it was also in 2022 (so way older vvvv release, older computer, etc.) but the time was rather 30sec or more (enough to fear vvvv had crashed at each launch until it comes to life).
Here, a 5 sec lag is totally tolerable in my case, but I’m still curious about what/why it would take so long?
Is there a rational/normal explanation to this or am I putting my finger on something fishy?
I’m trying to avoid any “way worse than that” when I’m in the context of the full real setup again.
@joreg to answer your questions (sorry I missed that)
I have one master computer sending 2 floats at mainloop rate (approx 100hz) to 64 clients (the floats are different for each client)
each client reports back at lower rate (approx 30hz) its current state (2 floats), so we can monitor the delta between latest order and current state. They all target the same port, so I just use one single OSCServer for reception.
We also have punctual (rare) messages for utility/config.
But that’s it, it’s super lightweight.
@sunep@yar
I am afraid we will not have the time budget to reflash the 64 hardware boards and change protocol… Also I don’t have long term experience of embedded MQTT libraries and would not feel comfortable implementing them on the fly without a long test drive on this project.
Plus, for latency-first considerations, I’d rather bet on the OSC/UDP lower overhead (…although probably very negligeable).
I know there could have been some other strategies (broadcasting the full list and each client picking up the right index, going ArtNet etc) but at the end of the day, for the sake of readibility/flexibility we went the 1-to-1 communication route, given the (imo) super lightweight bandwidth and straightforward routing needs.
All in all, all we have is basically 2 floats to and from each of the 64 movers at approx 100fps, which I believe is very lite in total…
Especially if I mutualize only one socket for sending to all 64 destinations, I think this is very small effort for the socket due to the very small message size and (still) reasonable destination list, no?
If I was constantly swapping through 64 different sockets for sending, my understanding is that it would be worse and not worth the multiple strategy.
So at the end of the day, I’m just trying to understand why I see this apparent “init time” in my patch and just want to ensure there is not any good practice I’m missing in this given setup and/or some flaw it might reveal in the OSC nodes.
If you tell me these ~5sec delay is “normal” for the initializations happening under the hood, and/or that there’s nothing really more I can optimize, then I will totally do with it but at least would sleep better 🤘
I know what you mean, yes.
But at that time you could not use the OSC protocol and concentrate on pure UDP. Choosing technologies carefully at the beginning can save time, nerves and even money. Now, legacy is what legacy is.
This is most likely due to the datagram (receiver) code. For example, it allocates 10k of memory asynchronously at startup. I haven’t looked into it yet, but there’s a problem somewhere.
Thanks for taking the time of this investigation!!
Have we really spotted the good suspect yet though? 🤓
if I delete ToOSCMessages in “AAAA” from your modified patch, I still have the same freeze after every F9 with your TEST1 enabled. Which means that wouldn’t be related to OSC but to the socket itself? (i.e. raw UDP would show the same behaviour)
also, surprisingly, if I switch to TEST2, I get the freeze after the first F9, but any subsequent F9 shows no noticeable freeze. Can this have an explanation?
for the receiver: have only one OSCServer node and only the OSCReceiver in the loop. now make all remote devices send to that one OSCServers port, you’re distinguishing their messages anyway by osc-address already, no need for them to all send to a different port.
if i’m not missing anything, now your scenario runs on 2 sockets. and pressing F9 shouldn’t cause much of a freeze anymore.
more details: when in your original patch, instead of pressing F9 you press F8/F5 you’ll notice that the freeze actually happens on F8. and it is essentially the udp receiver that takes so long to be taken down. the UDP nodes have been candidates for a rewrite for a while, so this will be part of that then.
and still OSC: obviously you shouldn’t need a custom version there, the OSC nodes should handle your scenario out-of-the-box. the idea is that every Send… node should simply take an optional target ip/port as input…
also: the OSCClient node has an optional receiver built in (the one i made you delete above from your custom version). instead of an extra receiver port, you could have your remote devices answer back to the socket they received the message from. then your whole scenario would run on a single socket.
Thanks @joreg for taking the time for a closer look
I confirm this eliminates the freeze! Thanks for the pointer.
Would be great to have these 2 nodes optionally enabled/disabled when they are not needed indeed.
Since the Data output is optional, is there a way to know if the user has or has not enabled it, so we could enable/disable Receiver+ToOSCMessages, maybe via a ManageProcess?
I agree this is the way in my real-life scenario and that’s what I had in place already.
In the test patch here I kept multiple receive ports on purpose to better understand where the culprit was.
Indeed!
That would definitely make the default nodes more universal
Good point
(although if I’m already using only two sockets now I guess this is already a pretty minimal setup enough for my scenario).
But your point indeed shows that sometimes we might want to mutualize sockets across the senders and receivers.
That’s actually something I can do now with the modified OSCClient from my initial patch that has this UdpSocket input. If I modified OSCServer in a similar fashion I could indeed feed both my Sender/Receiver chain the same UdpSocket to share.
Maybe an Optional Input in both these objects for an external UdpSocket overriding the default could make sense then?