Opened 5 years ago
Last modified 5 years ago
#3845 assigned enhancement
Handle low bandwidth in VR meetings
| Reported by: | Tom Goddard | Owned by: | Tom Goddard |
|---|---|---|---|
| Priority: | moderate | Milestone: | |
| Component: | VR | Version: | |
| Keywords: | Cc: | phil.cruz@… | |
| Blocked By: | Blocking: | ||
| Notify when closed: | Platform: | all | |
| Project: | ChimeraX |
Description
Want VR meetings to work for participants with lower bandwidth network connections and also handle changes of bandwidth in the middle of a meeting.
Today Phil Cruz, Victor Kramer and I had a VR meeting looking at sars-cov-2 spike and binding proteins. It worked smoothly for about 20 minutes and then Phil's bandwidth seemed to drop, his hand and head positions seen by me and Victor updated slowly and he did not see us moving. We were talking at the same time by Zoom audio an his audio was often garbled. Phil quit the meeting with meeting close then tried to rejoin. It took about 2 minutes to receivec the 25 Mbyte session while the first time he joined it took about 10 seconds. After he saw the molecules it appears he saw the changes Victor and I made as we looked at different binding proteins a few minutes after we made those changes. During this time I did not see Phil's head or hands but Victor did see them. I had error messages due to code that was checking for null bytes in messages, the debug code itself had the error, no null bytes, which may have caused me to not see his head/hands.
It appears what happened is that messages were being sent to Phil faster than the network connection could provide. Since there is unlimited buffering the backlog of commands sent to Phil just grew and grew. Likewise his upload speed was probably even slower than download since we had trouble understanding his Zoom audio but he heard us fine. So his meeting messages updating head and hand positions 80-90 times per second were probably also even more backlogged, so Victor and I saw very little motion. (I could sometimes see model moves Phil had done even though I could not see his head and hands.)
After Phil rejoined the 2 minutes while he was receiving the session probably buffered up 2 minutes worth of me and Victor's hand and head positions. I never observed my ChimeraX (the meeting host) block so the scene must have been sent in a fraction of a second and subsequent command message to Phil also must not have blocked my ChimeraX.
I should measure the bandwidth used sending head and hand positions per person with the default update rate of every frame.
It seems what is needed is the meeting host ChimeraX needs to detect if the messages being sent to the participant are backlogged, and also the participants need to detect if the messages they send to the host are backlogged. One method would be to not use unlimited buffering. Then writing the messages would block. That would freeze ChimeraX unless the message writing were in a separate thread. The thread could have an input queue and it could drop inessential messages (like hand/head positions) if its input queue grows to span more than a second. Participant messages to the host could use a similar approach. I am not sure what performance implications might arise from limiting the socket buffering. Another approach would be the messages have time stamps. Participants would also include the stamp of the last message they received and processed in their outgoing messages. If the host sees that a participant is way behind in processing the messages the participant was sent then it reduces the messages being sent. Likewise a host message to the participant would identify the last message received by the host from that participant. If the participant sees that the last message received by the host is old then it can reduce sending outgoing messages.
A related problem is when the network is fast enough to get all the messages but a participant can't process them fast enough. Currently the code reads every available message before drawing a frame. If the processing rate is too slow it can end up never drawing a frame or only very infrequently which causes the VR to freeze and black out. I observed this yesterday using drag model mode and put in a fix where only one motion command message gets processed per frame drawn on the participant's machine, the other available motion command message are dropped.
Change History (5)
comment:1 by , 5 years ago
comment:2 by , 5 years ago
Using quaternions for rotations and sending as binary float values the 2 hand and head positions at 7 floats per position would take 3*7*4 = 84 bytes, plus id, and don't resend name and color unless changed, should be able to use 120 byte messages, about 7 times smaller than current message size. Currently code passes just a text Python representation. If using binary values would be better off using msgpack. Would need a way to handle both formats so older ChimeraX could meet with newer ChimeraX.
Even with 7 times smaller bandwidth, as more participants join bandwidth will increase proportional to number of participants. Still need to control the number of messages sent to handle low bandwidth. Ideally an efficient message encoding and also code to reduce the number of messages if too many buffer up should both be implemented.
comment:3 by , 5 years ago
I added some code so meeting will not send head/hand or motion command update messages if the socket has more than 50 Kbytes that have not been sent yet. This stops the socket from getting ever more backlogged when bandwidth is inadequate. But tests showed a surprisingly large lag of 3 seconds. 50 Kbytes should be about 60 messages (800 bytes each) which should be less than a second worth since test was trying to send 90 updates per second (Oculus Rift frame rate). But the connection was slowed to only 50 Kbytes/sec so really should expect a 1 second lag. But I suspect what is going on is there is more buffering going on that is not a counted when I ask the QTcpSocket how many bytes have yet to be written. Not totally sure because I slowed the connection with an ugly hack of socket.setReadBufferSize(512) making the read buffer so small that it slows down transmission.
Ideally if bandwidth is enough to receive 50 of 90 messages per second there should be no lag, just half the updates per second, should barely be noticeable. But to achieve that the sender needs to drop enough messages so bytes/sec sent is less than the bandwidth so there is no backlog on the socket. This should be achievable because the sender knows how many bytes/second it has been sending so far so can just drop messages to target a slightly lower bandwidth. But as the connection bandwidth lowers or raises it needs to adapt quickly enough (within seconds) to avoid a backlog and large lag, and to avoid a brief slowdown resulting in a longer period of slowed transmission than necessary. This is going to be tricky because the right adaptation depends on how rapidly the bandwidth fluctuates and how big the fluctuations are. But should be able to handle common cases reasonably.
I considered another scheme where messages contain a message number or time stamp and also the number of the last received message and hub and participants use this info to see how much lag is present. But it is complex because the lag and bandwidth in one direction probably does not match the reverse direction. I think the above described scheme that does not require message numbers and time stamps is a better approach to try next.
Actually it needs more thought. The problem is the hidden buffering of several seconds is going to prevent me quickly seeing a slow down. The buffering will keep absorbing the new messages hiding the slow down until eventually it gets filled and I see the bandwidth slow, and now we are already lagging 3 seconds or more before we get the first clue that we no longer have enough bandwidth. It looks like message numbers or time stamps may be necessary to account for too much hidden buffering that is outside of my control. Maybe I should target a simpler goal, to handle non-varying low bandwidth, and beyond that only worry about very infrequent bandwidth changes.
comment:4 by , 5 years ago
Since adapting to bandwidth is quite tricky and I am currently using about 5 times the number of bytes needed to transmit head/hand position updates, maybe I should switch to the easier problem of transmitting smaller messages. This will allow using lower bandwidth but perversely if the bandwidth is too low the backlog and lag will be 5 times greater because the fixed buffering will hold 5 times more messages.
comment:5 by , 5 years ago
Non-VR meetings use very low bandwidth sending messages only when the scene is moved, a command is executed, or the mouse is hovered to show the pointer -- so most of the time no messages are being sent.
For VR meetings the continuous streaming of hand and head positions could be reduced by a large factor by dropping any motions smaller than say 0.5 centimeter displacement and 1 degree rotation. Often a participant keeps their head and hands relatively still so 90 updates per second are overkill.
For reducing message size, ticket #3958 discusses using msgpack instead of Python literals to serialize the messages.
Timing shows in a two person meeting I a participant receives 0.61 Mbits/sec or 76 Kbytes/sec and is getting 90 messages per second (other participant using Oculus Rift, 90 frames/sec) each message about 840 bytes long. Here is an example message. Message size could be cut about 3x by using reduced precision for hand/head positions. Would need to reorthogonalize the rotation matrices.
{'name': 'Tom', 'color': (135, 206, 235, 255), 'vr head': ((0.8388863205909729, 0.05549866333603859, 0.5414698719978333, -0.4748609662055969), (-0.01791265420615673, 0.9970642924308777, -0.0744437724351883, 0.790226936340332), (-0.5440118312835693, 0.05275070294737816, 0.8374177813529968, 1.0211575031280518)), 'vr hands': [((-0.965267539024353, -0.05128280445933342, 0.2561977207660675, -0.8715004920959473), (0.15383650362491608, 0.6810212731361389, 0.715923011302948, 0.7660024762153625), (-0.21119074523448944, 0.7304666042327881, -0.6494799852371216, 0.8116732835769653)), ((-0.662699282169342, -0.5801301598548889, 0.4735884666442871, -0.8111420273780823), (-0.15432025492191315, 0.7245994806289673, 0.6716710925102234, 0.7688000202178955), (-0.7328180074691772, 0.3720293641090393, -0.5697181820869446, 0.8447613716125488))], 'id': 0}