
Back Home

To Delay or not to Delay, that is the (Wireless Audio Latency) question…

This article attempts to detail the situations in which latency matters in a digital wireless audio link  used for real time applications – in particular, the live performance environment. Just to try and add  some context, in my previous life, my primary responsibility was around improving audio quality in  Bluetooth. I was frustrated by the occasional claim that, because the average human ear cannot  really hear sound above 10kHz, there was no real point in trying to achieve a 20kHz bandwidth. As  such the principles of 96kHz or 192kHz was simply a source of comedy value. Happily, common  sense broke out, and the “I can’t hear it so it doesn’t matter” brigade moved on. However, the net effect stalled Bluetooth being a reputable audio transport medium and much energy was burned in  winning the debate.

Moving on to a new era where Bluetooth LE Audio is becoming more prevalent, I’m starting to hear  the same thing regarding latency i.e. “it’s spec-man-ship” or “10 milliseconds (ms) here or there  doesn’t really matter” and at the other end of the spectrum “anything over 0ms won’t be tolerated”.  Given that latency is measurable as opposed to the subjective nature of audio quality, I just wanted  to spend some time outlining where delays are introduced in a wireless high quality audio link. Also  to consider what different users may be able to work with, notice or find debilitating. All being well,  this may go some way in avoiding several wasted years of pushing rocks uphill again.

There are several use cases where audio latency has a greater or lesser effect on the enjoyment of  the experience. Some are outlined below: 

• Real world audio / In person conversations (using hearing instruments) 

• Phone calls / video calls 

• Gaming 

• Live performance (musicians / DJs / theatre) 

• Music production (multichannel recording, editing, mixing, processing etc.) • Watching TV / movies / lip sync (extending into surround sound synchronisation) • Multi-person online performance (jamming online / virtual choirs) 

Each application will have a range of tolerances depending on the user and how tuned in they are  and, broadly speaking, this is related to competence and usage. A professionally trained vocalist who  performs live concerts 300+ days / year will have a greater sensitivity to latency when compared to  an occasional Karaoke singer.

Listening to music as an end-user over Classic Bluetooth with its >200+ms latency is acceptable as  there is no reference point. However simply put, this same amount of latency for live performance  musicians is a huge problem*. Ironically, musicians are trained to deal with some level of latency,  but only at amounts caused by natural acoustics. Sound waves travel at different speeds depending  on frequency, temperature, and other physical obstacles which cause absorption or reflection, but  generally on a stage or coming from a speaker, can be measured at speeds travelling at around 342  meters/second. This can cause a natural delay between performers of 10ms if they are distanced by  about 3.4m. As a musician on a stage with several band members, there are multiple levels of 

latency to deal with, caused by different instruments, frequency ranges, and where those musicians  are positioned in relation to each other. Still, many performers state that even a 10ms delay is  detectable, specifically if it is coming from the instrument that is held in their own hands (which  would not have noticeable latency).  

According to an AES paper on Latency E-library page – AES (aes2.org), a subjective listening test was  conducted to determine how objectionable various amounts of latency are for performers in live  monitoring scenarios. It was shown that the audibility of latency is dependent on both the type of  instrument and monitoring environment (Wedges vs IEM’s). This experiment showed that the  acceptable amount of latency can range from 42ms to less than 1.4ms. The grouping of those results  was Vocal <3ms, Drums <6ms, Piano < 10ms, Guitars <12ms, Keyboards <20ms.

The audio chain (without a wireless link) can have a total (this is for a one-way link and we’ll come  back to this point) latency of between 10 and 20ms. This latency is largely made up of: 

• Data convertors – typically between analog and digital domains or frequency and time  domains. 

• Signal processing – many different digital signal processing algorithms work on blocks of  samples, rather than one sample at a time, which requires buffers that are processed as a  single unit. If the block size isn’t consistent across a chain of processing modules, then  further buffering is needed. 

• Transferring data between systems – this could be between different software algorithms,  different chips within a single product, between different products or between different  locations over a network. Whether a wired or wireless data transfer is used it requires  buffers and transfers typically happen in blocks and can require retransmissions. This can be  further compounded when the data size of the radio transfers does not much the typical size  of the audio blocks, which requires further buffering to deal with the mismatch. 

• Clock mismatch – in some systems the transfer of data requires crossing clock domains  which can require additional buffers to handle the slightly different rates and the processing  required to deal with it. Fortunately, some professional systems are designed to run from a  common clock to avoid this problem. 

Now let’s dive into where latencies occur in an RF link. For this article I’ll look at Bluetooth LE and  UWB as possible RF options. (If the product designer has the luxury of designing their own protocol  and use discrete components, then that’s a different topic and not covered here.) For an off-the shelf Bluetooth LE SOC based design the lowest latency appears to be ~19ms using a standard  protocol and LC3+. Now we have a one-way link (Mic or IEM) that is, at best 29ms (normal audio  chain + RF). Double that (Mic and IEM) and we’ve got a compounding issue where the 58ms is way  beyond the 42ms cited by the AES paper. 

Things are improved by using a proprietary RF protocol that sits on top of BLE i.e. LiveOnAir from  Virscient https://www.virscient.com/ Solutions LiveOnAir. Using the LiveOnAir BLE link with LC3+ can  get the RF part down from 19ms to ~12ms. Further improvements can be found when using Skylark,  a low latency audio solution from Audio Codecs Home – Audio Codecs. This now delivers an RF figure  of around 3.5ms. The entire production link is now 13.5ms and a round trip of under 27ms is achievable. The beauty being the ability to use off-the-shelf SOC’s and enjoy the cost benefits. In this  case the BLE SOC was Nordic’s nRF5340 which has an open architecture to enable LiveOnAir and an  ARM based processor which hosted Skylark.

It’s probably worth diving into the differences between LC3+ and Skylark which had contrasting  levels of system performance i.e. 12ms vs 3.8ms. This is largely due to the 88 samples used during  processing by Skylark which results in an encode / decode processing delay of 1.8ms. In addition, as  the algorithm is designed for RF application it is intrinsically tolerant to Bit Error and thus negates  the need to have an additional FEC added to the RF link. Finally, aside from the latency metrics,  Skylark delivers 24 Bit audio sampled at 48kHz.

Moving on from the BLE SOC’s, which requires bit rate efficiencies to achieve audio links, to UWB.  When I first heard about UWB it almost had magical properties i.e. super low latencies and enough  bandwidth to support Linear PCM at 24 Bit, 96kHz sampled audio. As an RF solution, it’s getting  attention as more radio vendors start to supply platforms and there’s a rising number of device  manufacturers supporting UWB it i.e. Samsung, Google and Apple. Currently UWB is largely used for  location and Access Control. However, due to the frequencies UWB works at, it is very susceptible to  drop out due to body-blocking and detuning. This is ok for non-real time applications or use cases  that can rely on reflections for a signal to arrive. But for audio and specifically real time (or wide open spaces where there are no reflections), mission critical applications where glitching is a cardinal  sin, then UWB was not an option. That was until Antennaware BodyWave™ UWB antenna |  AntennaWare directly addressed the body-blocking with patented techniques which can add up to  an extra 20dB of gain and thus ensure body-blocking isn’t an issue. 

RF Protocol + Codec Channel Number RF Latency (1 way)
BT LE / LC3+ Mono / Stereo 19ms
BT LE / LiveOnAir & LC3+ Mono / Stereo 12ms
BT LE / LiveOnAir / Skylark Mono / Stereo 3.5ms
UWB / Bodywave / LiveOnAir Mono / Stereo 2ms
UWB / Bodywave / LiveOnAir / SkylarkSurround Sound 3.5ms**

** Purely a working hypothesis

Now that UWB is becoming a realistic option for audio, the latencies are measurable below 3ms with  Linear PCM from Audio input to Audio output. If we go back to the Live Performance setup, the  entire chain is now close to 10ms which equates to 10 feet of distance. It’s not really for me to say  whether that’s good enough, only the performer can make that statement. But what I can say is that  cables for Mics may be removed and performers can protect their hearing with cost-effective IEM’s. If we’re able to crack one of the most demanding use cases in Live Performance for Musicians, then  all being well the Gamers and DJ’s can enjoy wireless audio connectivity without a hindering latency. 

*Shamelessly copied from Why can’t musicians jam with each other online without latency or other  issues? | by Caleb Dolister | Medium Latency is a problem for musicians. For any readers that are  unfamiliar with how time is calculated in music, speed is interpreted as a number of beats per minute  (bpm), called a tempo. 60bpm = 1 beat per second, 120bpm = 2 beats per second, and so on. If the  tempo of a song is 120bpm, this equates to 500ms between beats (1sec=1000ms, .5sec=500ms). At a  distance of 20′, there is a natural latency of 18ms causing that 500ms/120bpm to feel like  518ms/115bpm. In layman’s terms, it feels like the other player is performing at a slower speed even  if they are not. To compensate for this natural occurrence of latency, larger bands are led by  conductors so that there is a visual representation of time. 

Aside from plagiarising Caleb Dolister’s content (and currently owing him a bottle of Bushmills Whiskey), this writer would also like to also thank Gary Spittle of Sonical for his help and guidance.

Bluetooth Wireless Audio, Innovations in Audio, and More in audioXpress December 2024 | audioXpress