Web Audio API Deep Dive: How Browser Audio Really Works

In my previous post on Tone.js, I knew it used the Web Audio API internally, but I hadn't really dug into how it actually works. I wanted to peel back the layer of abstraction that Tone.js provides, so I went ahead and explored the Web Audio API directly.

The Limitations of HTML5 Audio

The simplest way to handle sound in the browser is the <audio> element. It handles playback, stop, pause, and seeking just fine. But there are things it simply cannot do:

No access to frequency or waveform data
No real-time effects (reverb, filters)
No 3D spatial audio
No precise timing control

The relationship between <audio> and the Web Audio API is similar to that of <img> and <canvas>. For simple playback, <audio> is enough, but if you need to analyze or manipulate sound, you need the Web Audio API.

AudioContext: Where Everything Begins

The entry point to the Web Audio API is AudioContext. Every audio node is created and connected within a single AudioContext.

const audioCtx = new AudioContext();

Key Properties

Property	Description
`sampleRate`	Audio sample rate (typically 44100Hz or 48000Hz)
`currentTime`	Monotonically increasing time (in seconds) since context creation. Cannot be paused
`state`	One of `"suspended"`, `"running"`, `"closed"`
`destination`	Final output (speakers/headphones)

It's best to create a single AudioContext and reuse it. Browsers limit the number of simultaneously active AudioContexts.

Autoplay Policy

AudioContext starts in a "suspended" state by default. Sound cannot play without a user gesture (e.g., a click).

const audioCtx = new AudioContext(); // state: "suspended"
 
document.getElementById("play")?.addEventListener("click", () => {
  if (audioCtx.state === "suspended") {
    audioCtx.resume();
  }
  // Now sound can be played
});

OfflineAudioContext

Used when you need pre-rendering rather than real-time playback. Outputs to an AudioBuffer instead of speakers. This is useful, for example, when exporting audio with reverb applied to a file.

// Render a 44.1kHz stereo 10-second buffer
const offlineCtx = new OfflineAudioContext(2, 44100 * 10, 44100);
 
// After setting up nodes...
const renderedBuffer = await offlineCtx.startRendering();

The Audio Graph: Connecting Nodes

The Web Audio API is a modular routing system. AudioNodes form a directed acyclic graph (DAG). The basic flow looks like this:

Source → Processing → Destination

const audioCtx = new AudioContext();
const oscillator = audioCtx.createOscillator();
const gainNode = audioCtx.createGain();
 
oscillator.connect(gainNode);
gainNode.connect(audioCtx.destination);
 
oscillator.start();

A key point here is that audio processing runs on a separate audio rendering thread, isolated from the main thread. DOM manipulation doesn't interfere with audio, and audio processing doesn't block the UI.

AudioNode Types

Source Nodes (Sound Generation)

OscillatorNode — Generates periodic waveforms. The basic building block of a synthesizer.

const osc = audioCtx.createOscillator();
osc.type = "sawtooth"; // sine, square, sawtooth, triangle
osc.frequency.value = 440; // A4 note
osc.connect(audioCtx.destination);
osc.start();
osc.stop(audioCtx.currentTime + 1); // Stop after 1 second

One gotcha I ran into: after calling stop(), you cannot start the same node again. You must create a new one. It's a "fire and forget" pattern. This works because node creation cost is extremely low.

AudioBufferSourceNode — Plays audio files loaded into memory.

const response = await fetch("/samples/drum.wav");
const arrayBuffer = await response.arrayBuffer();
const audioBuffer = await audioCtx.decodeAudioData(arrayBuffer);
 
const source = audioCtx.createBufferSource();
source.buffer = audioBuffer;
source.loop = true;
source.playbackRate.value = 1.5; // 1.5x speed
source.connect(audioCtx.destination);
source.start();

The AudioBuffer itself can be reused, but like OscillatorNode, an AudioBufferSourceNode must be discarded after a single use.

MediaElementAudioSourceNode — Connects an HTML <audio> element to the Web Audio graph. This lets you apply effects to an existing <audio> tag.

const audioEl = document.querySelector("audio") as HTMLAudioElement;
const source = audioCtx.createMediaElementSource(audioEl);
source.connect(audioCtx.destination);

Processing Nodes (Sound Manipulation)

Node	Role	Key Parameters
GainNode	Volume control	`gain` (default 1.0)
BiquadFilterNode	Filter (lowpass, highpass, and 6 others)	`frequency`, `Q`, `type`
ConvolverNode	Convolution reverb	`buffer` (impulse response)
DelayNode	Time delay	`delayTime`
DynamicsCompressorNode	Dynamic compression	`threshold`, `ratio`, `attack`
WaveShaperNode	Nonlinear distortion	`curve`, `oversample`
StereoPannerNode	Left/right panning	`pan` (-1.0 ~ 1.0)
PannerNode	3D spatial audio	`positionX/Y/Z`, `panningModel`
AnalyserNode	Frequency/waveform analysis (for visualization)	`fftSize`, `smoothingTimeConstant`

Here's an example of building an effect chain:

const source = audioCtx.createBufferSource();
const filter = audioCtx.createBiquadFilter();
const gain = audioCtx.createGain();
const compressor = audioCtx.createDynamicsCompressor();
 
filter.type = "lowpass";
filter.frequency.value = 1000;
gain.gain.value = 0.8;
 
source.connect(filter);
filter.connect(gain);
gain.connect(compressor);
compressor.connect(audioCtx.destination);
 
source.buffer = audioBuffer;
source.start();

This is exactly what Tone.js's .chain() method abstracts away.

AudioParam: Time-Based Automation

AudioParam is one of the most powerful features of the Web Audio API. It lets you change audio parameters over time with sample-accurate precision.

Why It's More Precise Than setTimeout

setTimeout runs on the main thread's event loop, so ~100ms jitter is common. AudioParam automation runs on the audio rendering thread, guaranteeing sub-millisecond accuracy.

Automation Methods

const gain = audioCtx.createGain();
 
// Set value immediately at a specific time
gain.gain.setValueAtTime(0, audioCtx.currentTime);
 
// Gradually change linearly (fade in)
gain.gain.linearRampToValueAtTime(1, audioCtx.currentTime + 2);
 
// Change exponentially (more natural decay)
gain.gain.exponentialRampToValueAtTime(0.01, audioCtx.currentTime + 4);
// Note: target value must not be 0. Use a value close to 0 (0.01) instead.

Method	Description
`setValueAtTime(value, time)`	Set value immediately at a specific time
`linearRampToValueAtTime(value, endTime)`	Linear interpolation
`exponentialRampToValueAtTime(value, endTime)`	Exponential interpolation (target ≠ 0)
`setTargetAtTime(target, startTime, timeConstant)`	Approach target with exponential decay
`setValueCurveAtTime(values, startTime, duration)`	Follow a custom curve
`cancelScheduledValues(cancelTime)`	Cancel events after the specified time

An important thing I learned: when scheduled events exist, directly assigning to value can conflict with existing automation and produce results you did not expect. It is safer to stay consistent and control the parameter through one automation approach instead of mixing styles.

Practical Pattern: ADSR Envelope

You can implement the Attack-Decay-Sustain-Release envelope — a fundamental of synthesizers — directly with AudioParam.

function triggerNote(
  osc: OscillatorNode,
  gain: GainNode,
  now: number
) {
  // Attack: ramp quickly from 0 to 1
  gain.gain.setValueAtTime(0, now);
  gain.gain.linearRampToValueAtTime(1, now + 0.02);
 
  // Decay + Sustain: decay from 1 to 0.3
  gain.gain.linearRampToValueAtTime(0.3, now + 0.1);
 
  // Release: from 0.3 to 0
  gain.gain.linearRampToValueAtTime(0, now + 0.5);
 
  osc.start(now);
  osc.stop(now + 0.5);
}

This is the pattern that Tone.js's Synth automates internally.

Audio Loading and Memory

fetch + decodeAudioData is the standard pattern.

async function loadAudio(url: string): Promise<AudioBuffer> {
  const response = await fetch(url);
  const arrayBuffer = await response.arrayBuffer();
  return audioCtx.decodeAudioData(arrayBuffer);
}
 
const kickBuffer = await loadAudio("/samples/kick.wav");

Decoded audio is stored in RAM as uncompressed 32-bit PCM. To give you a sense of the size:

44.1kHz stereo, 1 minute = 44100 × 2 channels × 4 bytes × 60 seconds ≈ ~10MB

Even if the original MP3 is 1MB, it can become 10MB after decoding. When loading multiple large files, memory fills up fast, so it's important to cache decoded AudioBuffers to avoid redundant decoding.

Visualization with AnalyserNode

AnalyserNode passes audio through without modification while extracting frequency/waveform data.

Frequency Bar Chart

const analyser = audioCtx.createAnalyser();
analyser.fftSize = 256;
const bufferLength = analyser.frequencyBinCount; // fftSize / 2 = 128
const dataArray = new Uint8Array(bufferLength);
 
source.connect(analyser);
analyser.connect(audioCtx.destination);
 
function draw(ctx: CanvasRenderingContext2D, width: number, height: number) {
  requestAnimationFrame(() => draw(ctx, width, height));
 
  analyser.getByteFrequencyData(dataArray); // Range: 0~255
 
  ctx.fillStyle = "#000";
  ctx.fillRect(0, 0, width, height);
 
  const barWidth = width / bufferLength;
 
  for (let i = 0; i < bufferLength; i++) {
    const barHeight = (dataArray[i] / 255) * height;
    ctx.fillStyle = `hsl(${(i / bufferLength) * 360}, 80%, 50%)`;
    ctx.fillRect(i * barWidth, height - barHeight, barWidth - 1, barHeight);
  }
}

Waveform (Oscilloscope)

const analyser = audioCtx.createAnalyser();
analyser.fftSize = 2048;
const bufferLength = analyser.frequencyBinCount;
const dataArray = new Uint8Array(bufferLength);
 
function drawWaveform(ctx: CanvasRenderingContext2D, width: number, height: number) {
  requestAnimationFrame(() => drawWaveform(ctx, width, height));
 
  analyser.getByteTimeDomainData(dataArray); // Range: 0~255 (128 = silence)
 
  ctx.fillStyle = "#000";
  ctx.fillRect(0, 0, width, height);
  ctx.lineWidth = 2;
  ctx.strokeStyle = "#0f0";
  ctx.beginPath();
 
  const sliceWidth = width / bufferLength;
 
  for (let i = 0; i < bufferLength; i++) {
    const v = dataArray[i] / 128.0;
    const y = (v * height) / 2;
    if (i === 0) ctx.moveTo(0, y);
    else ctx.lineTo(i * sliceWidth, y);
  }
 
  ctx.stroke();
}

Resolution varies with fftSize. For frequency bars, a small value like 256 works well, while waveforms benefit from 2048 or higher.

AudioWorklet: Custom Audio Processing

Previously, ScriptProcessorNode was used for custom audio processing, but it ran on the main thread, causing UI blocking and audio glitches. AudioWorklet replaces it.

AudioWorklet runs custom JS code directly on the audio rendering thread. That avoids main-thread round-trips in the processing path and helps reduce glitch risk, although buffering and render quanta still apply.

Implementation

Write a separate worklet processor file:

// white-noise-processor.ts
class WhiteNoiseProcessor extends AudioWorkletProcessor {
  process(
    _inputs: Float32Array[][],
    outputs: Float32Array[][],
    _parameters: Record<string, Float32Array>
  ): boolean {
    const output = outputs[0];
 
    for (const channel of output) {
      for (let i = 0; i < channel.length; i++) {
        channel[i] = Math.random() * 2 - 1;
      }
    }
 
    return true; // true = keep running
  }
}
 
registerProcessor("white-noise-processor", WhiteNoiseProcessor);

await audioCtx.audioWorklet.addModule("/white-noise-processor.js");
 
const noiseNode = new AudioWorkletNode(audioCtx, "white-noise-processor");
noiseNode.connect(audioCtx.destination);

Parameter Definitions

class GainProcessor extends AudioWorkletProcessor {
  static get parameterDescriptors() {
    return [
      {
        name: "customGain",
        defaultValue: 1.0,
        minValue: 0,
        maxValue: 2,
        automationRate: "a-rate", // Sample-level precision
      },
    ];
  }
 
  process(
    inputs: Float32Array[][],
    outputs: Float32Array[][],
    parameters: Record<string, Float32Array>
  ): boolean {
    const input = inputs[0];
    const output = outputs[0];
    const gainValues = parameters.customGain;
 
    for (let ch = 0; ch < output.length; ch++) {
      for (let i = 0; i < output[ch].length; i++) {
        // a-rate: gainValues.length === 128 (different value per sample)
        // k-rate: gainValues.length === 1 (same value for entire block)
        const gain = gainValues.length > 1 ? gainValues[i] : gainValues[0];
        output[ch][i] = input[ch][i] * gain;
      }
    }
 
    return true;
  }
}
 
registerProcessor("gain-processor", GainProcessor);

You can automate the parameters from the main thread:

const gainParam = gainNode.parameters.get("customGain");
gainParam?.linearRampToValueAtTime(0, audioCtx.currentTime + 2);

Inside worklets, garbage collection can cause audio glitches, so it's important to minimize object allocation and reuse buffers.

Spatial Audio

StereoPannerNode (Simple Left/Right Panning)

const panner = audioCtx.createStereoPanner();
panner.pan.value = -1; // -1 = left, 0 = center, 1 = right
 
source.connect(panner);
panner.connect(audioCtx.destination);
 
// Pan from left to right over 2 seconds
panner.pan.linearRampToValueAtTime(1, audioCtx.currentTime + 2);

PannerNode (3D Spatial Audio)

Used in games and VR/AR experiences.

const panner = audioCtx.createPanner();
panner.panningModel = "HRTF"; // More realistic (higher CPU). "equalpower" is lighter
panner.distanceModel = "inverse";
panner.refDistance = 1;
panner.maxDistance = 100;
panner.rolloffFactor = 1;
 
// Set 3D position of the sound source
panner.positionX.value = 5;
panner.positionY.value = 0;
panner.positionZ.value = -3;
 
// Listener position (one per AudioContext)
const listener = audioCtx.listener;
listener.positionX.value = 0;
listener.positionY.value = 0;
listener.positionZ.value = 0;
 
source.connect(panner);
panner.connect(audioCtx.destination);

The HRTF model simulates how human ears perceive sound differently, creating a realistic 3D effect over headphones. However, it's CPU-intensive, so be cautious when applying it to many sound sources.

Performance Considerations

1) Source nodes are single-use

OscillatorNode and AudioBufferSourceNode cannot be reused after stop(). For repeated playback, you need to create a new node each time. Node creation cost is extremely low, so this isn't a performance issue.

2) Know which nodes are CPU-intensive

ConvolverNode (convolution reverb) and PannerNode in HRTF mode consume the most CPU. On mobile, it's best to limit the use of these nodes.

3) Manage AudioBuffer memory

Decoded audio sits in RAM as uncompressed PCM. A single 1-minute stereo file is ~10MB. Release references to unused buffers so they become eligible for GC.

4) AudioWorklet requires HTTPS

Due to the Secure Context requirement, you need to use localhost or set up HTTPS for local development.

Raw Web Audio API vs Tone.js

Area	Raw Web Audio API	Tone.js
Timing	Seconds (`currentTime`)	Musical notation (`"4n"`, `"8n"`)
Transport control	None	Transport (start/stop/loop)
Instruments	OscillatorNode + manual envelope	Synth, FMSynth, etc. + built-in ADSR
Effects	Manual individual node connections	Pre-built effects + `.chain()`
Polyphony	Manual voice management	PolySynth auto-allocation
Autoplay	Manual `resume()`	`Tone.start()`

When Raw API is the better fit:

Simple audio tasks (notification sounds, sound effects)
When bundle size matters
When you need maximum control without a framework
For learning purposes

When Tone.js is the better fit:

Musical applications (sequencers, drum machines)
Complex scheduling and Transport control
Diverse instruments and effect chains

Since Tone.js uses native Web Audio nodes under the hood, its own performance overhead is minimal.

Conclusion

Working with the Web Audio API directly gave me a real appreciation for how much Tone.js abstracts away. At the same time, understanding the raw API made Tone.js's behavior much more predictable.

Here are the key takeaways:

AudioParam automation is the real strength of the Web Audio API. It provides sample-accurate timing control that setTimeout simply cannot match
Source nodes are single-use — once you call stop, you need to create a new one
AudioWorklet replaced ScriptProcessorNode and runs custom DSP directly on the audio thread
Always be mindful of decoded AudioBuffer memory size (10-20x larger than the compressed MP3)

For simple sound effects or notification sounds, the raw API is more than enough. If you need musical features, Tone.js will save you significant development time.

Thanks for reading.