To Delay or not to Delay, that is the (Wireless Audio Latency) question…

This article attempts to detail the situations in which latency matters in a digital wireless audio link used for real time applications – in particular, the live performance environment. Just to try and add some context, in my previous life, my primary responsibility was around improving audio quality in Bluetooth. I was frustrated by the occasional claim that, because the average human ear cannot really hear sound above 10kHz, there was no real point in trying to achieve a 20kHz bandwidth. As such the principles of 96kHz or 192kHz was simply a source of comedy value. Happily, common sense broke out, and the “I can’t hear it so it doesn’t matter” brigade moved on. However, the net effect stalled Bluetooth being a reputable audio transport medium and much energy was burned in winning the debate.

Moving on to a new era where Bluetooth LE Audio is becoming more prevalent, I’m starting to hear the same thing regarding latency i.e. “it’s spec-man-ship” or “10 milliseconds (ms) here or there doesn’t really matter” and at the other end of the spectrum “anything over 0ms won’t be tolerated”. Given that latency is measurable as opposed to the subjective nature of audio quality, I just wanted to spend some time outlining where delays are introduced in a wireless high quality audio link. Also to consider what different users may be able to work with, notice or find debilitating. All being well, this may go some way in avoiding several wasted years of pushing rocks uphill again.

There are several use cases where audio latency has a greater or lesser effect on the enjoyment of the experience. Some are outlined below:

• Real world audio / In person conversations (using hearing instruments)

• Phone calls / video calls

• Gaming

• Live performance (musicians / DJs / theatre)

• Music production (multichannel recording, editing, mixing, processing etc.) • Watching TV / movies / lip sync (extending into surround sound synchronisation) • Multi-person online performance (jamming online / virtual choirs)

Each application will have a range of tolerances depending on the user and how tuned in they are and, broadly speaking, this is related to competence and usage. A professionally trained vocalist who performs live concerts 300+ days / year will have a greater sensitivity to latency when compared to an occasional Karaoke singer.

Listening to music as an end-user over Classic Bluetooth with its >200+ms latency is acceptable as there is no reference point. However simply put, this same amount of latency for live performance musicians is a huge problem*. Ironically, musicians are trained to deal with some level of latency, but only at amounts caused by natural acoustics. Sound waves travel at different speeds depending on frequency, temperature, and other physical obstacles which cause absorption or reflection, but generally on a stage or coming from a speaker, can be measured at speeds travelling at around 342 meters/second. This can cause a natural delay between performers of 10ms if they are distanced by about 3.4m. As a musician on a stage with several band members, there are multiple levels of

latency to deal with, caused by different instruments, frequency ranges, and where those musicians are positioned in relation to each other. Still, many performers state that even a 10ms delay is detectable, specifically if it is coming from the instrument that is held in their own hands (which would not have noticeable latency).

According to an AES paper on Latency E-library page – AES (aes2.org), a subjective listening test was conducted to determine how objectionable various amounts of latency are for performers in live monitoring scenarios. It was shown that the audibility of latency is dependent on both the type of instrument and monitoring environment (Wedges vs IEM’s). This experiment showed that the acceptable amount of latency can range from 42ms to less than 1.4ms. The grouping of those results was Vocal <3ms, Drums <6ms, Piano < 10ms, Guitars <12ms, Keyboards <20ms.

The audio chain (without a wireless link) can have a total (this is for a one-way link and we’ll come back to this point) latency of between 10 and 20ms. This latency is largely made up of:

• Data convertors – typically between analog and digital domains or frequency and time domains.

• Signal processing – many different digital signal processing algorithms work on blocks of samples, rather than one sample at a time, which requires buffers that are processed as a single unit. If the block size isn’t consistent across a chain of processing modules, then further buffering is needed.

• Transferring data between systems – this could be between different software algorithms, different chips within a single product, between different products or between different locations over a network. Whether a wired or wireless data transfer is used it requires buffers and transfers typically happen in blocks and can require retransmissions. This can be further compounded when the data size of the radio transfers does not much the typical size of the audio blocks, which requires further buffering to deal with the mismatch.

• Clock mismatch – in some systems the transfer of data requires crossing clock domains which can require additional buffers to handle the slightly different rates and the processing required to deal with it. Fortunately, some professional systems are designed to run from a common clock to avoid this problem.

Now let’s dive into where latencies occur in an RF link. For this article I’ll look at Bluetooth LE and UWB as possible RF options. (If the product designer has the luxury of designing their own protocol and use discrete components, then that’s a different topic and not covered here.) For an off-the shelf Bluetooth LE SOC based design the lowest latency appears to be ~19ms using a standard protocol and LC3+. Now we have a one-way link (Mic or IEM) that is, at best 29ms (normal audio chain + RF). Double that (Mic and IEM) and we’ve got a compounding issue where the 58ms is way beyond the 42ms cited by the AES paper.

Things are improved by using a proprietary RF protocol that sits on top of BLE i.e. LiveOnAir from Virscient https://www.virscient.com/ Solutions LiveOnAir. Using the LiveOnAir BLE link with LC3+ can get the RF part down from 19ms to ~12ms. Further improvements can be found when using Skylark, a low latency audio solution from Audio Codecs Home – Audio Codecs. This now delivers an RF figure of around 3.5ms. The entire production link is now 13.5ms and a round trip of under 27ms is achievable. The beauty being the ability to use off-the-shelf SOC’s and enjoy the cost benefits. In this case the BLE SOC was Nordic’s nRF5340 which has an open architecture to enable LiveOnAir and an ARM based processor which hosted Skylark.

It’s probably worth diving into the differences between LC3+ and Skylark which had contrasting levels of system performance i.e. 12ms vs 3.8ms. This is largely due to the 88 samples used during processing by Skylark which results in an encode / decode processing delay of 1.8ms. In addition, as the algorithm is designed for RF application it is intrinsically tolerant to Bit Error and thus negates the need to have an additional FEC added to the RF link. Finally, aside from the latency metrics, Skylark delivers 24 Bit audio sampled at 48kHz.

Moving on from the BLE SOC’s, which requires bit rate efficiencies to achieve audio links, to UWB. When I first heard about UWB it almost had magical properties i.e. super low latencies and enough bandwidth to support Linear PCM at 24 Bit, 96kHz sampled audio. As an RF solution, it’s getting attention as more radio vendors start to supply platforms and there’s a rising number of device manufacturers supporting UWB it i.e. Samsung, Google and Apple. Currently UWB is largely used for location and Access Control. However, due to the frequencies UWB works at, it is very susceptible to drop out due to body-blocking and detuning. This is ok for non-real time applications or use cases that can rely on reflections for a signal to arrive. But for audio and specifically real time (or wide open spaces where there are no reflections), mission critical applications where glitching is a cardinal sin, then UWB was not an option. That was until Antennaware BodyWave™ UWB antenna | AntennaWare directly addressed the body-blocking with patented techniques which can add up to an extra 20dB of gain and thus ensure body-blocking isn’t an issue.

RF Protocol + Codec	Channel Number	RF Latency (1 way)
BT LE / LC3+	Mono / Stereo	19ms
BT LE / LiveOnAir & LC3+	Mono / Stereo	12ms
BT LE / LiveOnAir / Skylark	Mono / Stereo	3.5ms
UWB / Bodywave / LiveOnAir	Mono / Stereo	2ms
UWB / Bodywave / LiveOnAir / Skylark	Surround Sound	3.5ms**

** Purely a working hypothesis

Now that UWB is becoming a realistic option for audio, the latencies are measurable below 3ms with Linear PCM from Audio input to Audio output. If we go back to the Live Performance setup, the entire chain is now close to 10ms which equates to 10 feet of distance. It’s not really for me to say whether that’s good enough, only the performer can make that statement. But what I can say is that cables for Mics may be removed and performers can protect their hearing with cost-effective IEM’s. If we’re able to crack one of the most demanding use cases in Live Performance for Musicians, then all being well the Gamers and DJ’s can enjoy wireless audio connectivity without a hindering latency.

*Shamelessly copied from Why can’t musicians jam with each other online without latency or other issues? | by Caleb Dolister | Medium Latency is a problem for musicians. For any readers that are unfamiliar with how time is calculated in music, speed is interpreted as a number of beats per minute (bpm), called a tempo. 60bpm = 1 beat per second, 120bpm = 2 beats per second, and so on. If the tempo of a song is 120bpm, this equates to 500ms between beats (1sec=1000ms, .5sec=500ms). At a distance of 20′, there is a natural latency of 18ms causing that 500ms/120bpm to feel like 518ms/115bpm. In layman’s terms, it feels like the other player is performing at a slower speed even if they are not. To compensate for this natural occurrence of latency, larger bands are led by conductors so that there is a visual representation of time.

Aside from plagiarising Caleb Dolister’s content (and currently owing him a bottle of Bushmills Whiskey), this writer would also like to also thank Gary Spittle of Sonical for his help and guidance.

Bluetooth Wireless Audio, Innovations in Audio, and More in audioXpress December 2024 | audioXpress