And then we arrive at the problem of reliable UDP. UDP is not reliable, i.e. it is not guaranteed that messages arrive at their destination, and if messages arrive, it is not guaranteed they arrive in order. This is a problem if you want to send messages that are larger than the maximum packet size (around 64k), which can easily happen with, for instance, camera depth maps or 3D face mesh data. They need to be cut up before sending, and put back together at the receiving end.
So, in order to get back some reliability (a publisher sends a message to update a data structure, the subscribers update accordingly), we need to design a strategy to make sure the data is broken up into chunks and built back together in the right order. Missing chunks need to be retransmitted, and it should all go as fast as possible.
There are many strategies in use in a whole world of different protocols out there. After a few attempts at more complicated and unclear solutions, I’m currently exploring the following relatively simple strategy:
Publisher sends up to N chunks, followed by a heartbeat message. Subscriber receives the chunks and places them in the assembly buffer in the correct order. When subscriber receives a heartbeat, it sends back an acknowledgement of all chunks received since the previous heartbeat. Publisher receives this acknowledgement and removes the received chunks from the send list, adds the next few, and sends the N more chunks, followed by a heartbeat message. If publisher did not receive an acknowledgement within a certain time, it retransmits the exact same N chunks again. This continues until:
The entire message is successfully sent – Now the subscriber is updated, yay.
The publisher starts sending another message – The current message is aborted, the subscriber drops whatever it already assembled, and remains clueless about the update, but eagerly awaits the new update.
Currently, the publisher does this in parallel tasks to each subscriber. Again, the administration of which chunks are sent where, is mostly local to the task, and only the source data buffer is immutably shared between the tasks. The subscriber processes any incoming chunk or heartbeat from the socket in one task.
Again, the code is very straightforward and devoid of complicated structures or polling loops, yet should in theory be very efficient.