How to fine tune ALSA to minimize the varying delay/gap when segueing(~50-150ms)

From Rivendell Wiki

Jump to: navigation, search

Q: How can I reduce the transition/segue delay?

To clarify our use/environment I'm referring to the next audio event (from CUT START marker) instantly starting at the current events SEGUE START marker. When I last checked out RD six months ago (DB and audio on playback pc, using M-Audio sound card and on board sound), there was a varying delay/gap when segueing - perhaps in the order of 50-150ms. This may not be an issue for some stations, but can sound loose/disjointed in our format, and would presumably introduce cumulative timing errors in cleanly filling local windows in network feeds, etc.

A: Ah, ALSA. There are some tuning knobs in /etc/rd.conf that could significantly improve this; specifically, see the 'PeriodSize=' parameter in the [Alsa] section. That file ships with a very conservative number for this setting so we don't have xrun on setups with poor realtime performance. Lowering it will reduce playout latency at the cost of demanding better RT performance. At some point as you lower it, you'll start getting xruns (often audible, but can also be seen in the syslog); you generally want this set at a value just above that point. On well-tuned hardware with a preemptive kernel and an ICE2712-based card, I've been able to get this as low as 64 frames/period.

Follow-up comment: It would be good to eliminate all segue delay in RD for all audio devices, or at least for the ASI and Axia/Logitek AoIP drivers (i have not tested this).

A: *All* is not a realistic goal: CPUs are inherently sequential devices, so no pure software solution will be able to achieve this. A more reasonable goal would be to reduce it below the level of human audibility; research I've seen puts this threshold at about 5 mS. An ASI setup can certainly comes close to or beat that standard (I've never formally measured it). Axia AoIP *is* ALSA, so those constraints apply, although there you will be limited to a minimum 'PeriodSize=' value of 256 due to the design of the LiveWire protocol (240 PCM frames/packet for standard streams). As for the Logitek stuff, I have no idea -- I've yet to see even a theoretical design of how that's supposed to work.

Q: Can you not reduced the latency to zero with a true real time system?

A: Not even there. In any DSP system, some amount of buffering needs to take place, something like:

 	read in data
	operate on it (mix, normalize, whatever)
	write out data

Even if your 'buffer' is just a single frame, there will still be latency, by virtue of the fact that a *sequence* of discrete operations (each of which consume a non-zero amount of time) need to be carried out on it. Even high-end A-to-D and D-to-A converters that operate on single frames this way have measurable latencies (on the order of uS rather than mS, but still greater than zero).

Bottom line: 'zero latency' ain't possible. The question instead becomes: What is the maximum latency that we can tolerate before it becomes perceptible?

Q: How does the parameter PeriodQuantity affect the latency and the xruns? What does it do?

A: That value sets the buffer size (in PCM frames) which the DSP engine inside caed(8) will use when processing audio (both capture and playout). The larger the buffer size, the longer the latency. Think of it in terms of granularity -- smaller buffers mean finer-grained control over things like playout starts. There's a danger with smaller buffers though -- as the size shrinks, the number of times per second which the code needs to execute that processing increases. All OSes have a limit beyond which they will not be able to schedule that processing reliably. When that happens, the result is xruns -- the system can't keep up, so data is lost, resulting in audible clicks, pops and other nasty artifacts.

The distinctive feature of the 'hard' realtime OSes that Cowboy likes to talk about is that they offer a *guarantee* that a certain process can be scheduled within a certain period of time. It's not necessarily that it's *fast*, but rather the fact that it's *predictable* ('deterministic' is the buzzword used by software engineers) that makes a hard RTOS so valuable for implementing certain types of solutions.

Linux is not a hard RTOS (neither is Windows, OS X nor any other 'general purpose' OS). The best it can do is 'soft' RT, which is basically a 'best effort' sort of thing; even that requires using a preemptive kernel and careful system tuning. This is why ASI cards can deliver significantly better performance here: each of those cards actually has a separate computer processor onboard, running a true hard RTOS, that is totally dedicated to processing audio. As a result, the card's internal DSP core can operate with very small mix buffers, hence giving very good (low) latency. That's also one of the reasons why those cards are so expensive.

In the ALSA/JACK realm, we're not so fortunate; we end up having to emulate all that using the main computer CPU. Linux is not hard RT; that means larger buffers, hence more latency.

IMPORTANT NOTE: The bottom line on all of this is something I've said many times over the years: the JACK/ALSA drivers can be very useful for many roles in Rivendell, but for an RDAirPlay system that you're putting on the air, buy an ASI card. You won't regret it.

Answers by:

Frederick F. Gleason, Jr.

Chief Developer

Paravel Systems

Personal tools