Getting sound out of it

A fantasy console with no sound is half a console. PICO-8 carts ship their music and effects right there in the cart data, so once carts were running on the device, the next job was getting that audio out of the chip and into a speaker. The path is a PCM5102A I2S DAC hanging off the H7's SAI peripheral, fed by DMA. Three separate things went wrong on the way, and each one was the kind of bug that produces total silence with no error to chase.

The wiring

The PCM5102A is a cheap, good little DAC on a breakout board. It speaks I2S, so it needs three clocks and data:

SignalPinSAI functionNucleo
BCK (bit clock)PE5SAI1_SCK_ACN9 18
LRCK (word clock)PE4SAI1_FS_ACN9 16
DIN (data)PE6SAI1_SD_ACN9 20

The master clock, MCLK, is deliberately not wired. The board ties its SCK header pin to ground, which tells the onboard PLL to generate its own clocks from BCK. One less wire and one less clock to get exactly right.

The DAC runs as a SAI master transmitter, 16-bit stereo (a 32-bit frame, 16 bits per channel), fed from a DMA ring. The clock tree lands the sample rate at about 44.118 kHz, derived from PLL2. Not exactly 44.1, but the error is about 0.04%, which no one can hear.

Gotcha one: the mute pin

A0 was just a hardcoded 440 Hz square wave, the "is anything alive" test. I wired it all up, flashed it, and got nothing. No tone, no noise, nothing.

The PCM5102A has an XSMT pin: soft mute, active low. Pull it high and the DAC plays. Pull it low (or leave it floating) and it stays muted. I had assumed, from other boards, that the breakout pulled it high on-board. This particular board does not. It sat there correctly clocked and correctly fed, dutifully muting everything, until I ran a wire from XSMT to 3.3V. There is also a solder bridge on the back that ties XSMT low if you close it. Leave it open.

That is a whole evening for one jumper wire. The DAC was never broken.

Gotcha two: the data cache

Remember the caches I turned on to speed up Lua? They came back to bite me here.

The DMA ring buffer lives in AXI SRAM, which is cached. The DMA engine writes to actual RAM. The CPU reads through the data cache. With no cache maintenance in between, the CPU reads stale cache lines and the DMA's view and the CPU's view of the same buffer drift apart. On the H7 this is the classic DMA-plus-D-cache coherency trap, and the embassy SAI driver does not do the maintenance for you.

The fix is to carve out a small piece of memory that the cache leaves alone. I set up one MPU region covering the DMA ring, marked Normal, Non-cacheable, 2 KB, aligned to a 2 KB boundary (so the ring buffer is declared with align(2048) to fit it exactly). Outside that region the data cache stays on and everything else keeps its speedup. The ordering matters: configure the MPU region, barrier, then enable the caches, so the cache comes up already knowing the ring is off-limits.

Gotcha three: the executor that starved

With the tone clean, I moved to real synthesis: rendering a cart's actual music through fcsynth, my port of the PICO-8 sound model. It worked, until a cart started doing real work in its draw loop, and then the audio fell apart into stuttering and silence.

This one is about how embassy schedules. Everything was running on the thread executor, cooperatively, single core. When a Lua cart spent a long time inside _draw, it did not yield, and the audio refill task simply did not get to run. The DMA ring drained faster than it was being topped up and underran to silence.

Audio cannot wait its turn. The fix was to give it a turn it cannot be denied: a separate high-priority interrupt executor at priority P6, with the audio task spawned onto it. Now the audio refill preempts the present loop. Even with a cart hammering the CPU mid-frame, the refill fires on its interrupt, tops up the ring, and gets out. After that, the report I was after:

A3 done. fcsynth plays the Solais music on-device. 0 underruns with the full four-voice mix.

The synth itself, about 22 KB of state, lives in the D2 SRAM in a dedicated .d2_bss section, because the AXI bank is full of framebuffers.

The synth, briefly

fcsynth is a no_std library with no allocation: fixed arrays for the sound and music regions, fixed delay lines for reverb. On the host build it uses the normal float methods; on the bare-metal build it pulls powf, floor, and friends from libm through a small trait. It implements the parts of PICO-8's audio that make it sound like PICO-8: the seven per-note effects (slide, vibrato, pitch drop, fade in, fade out, and two arpeggios), custom instruments, the per-effect filter bits (buzz, noise, detune, reverb, low-pass dampening), and music as chained patterns of four channels each.

One thing I tried and threw away

PICO-8's square and pulse waves have hard edges, and on some carts the attacks "spit" a little. I built an anti-click envelope: a roughly 1.5 ms ramp on each note, a short release tail, oscillator state carried across retriggers. By the numbers it was a clear win. A ten-second capture of a title theme went from about 22 transient spikes down to 8, and the peak came down too.

It sounded worse. The ramp softened the attacks just enough to sand off the character that makes PICO-8 audio feel sharp and immediate. The measurement said better, my ears said worse, and on something whose entire point is how it sounds, the ears win. I reverted it. If I revisit it, it will be a much shorter ramp, no soft clip, no release tail, and I will A/B it by ear before it goes anywhere near a commit. Metrics are not the judge here. The speaker is.