Fire Sound: Spectral Bandwidth Extension

The setting

A physically based flame simulator reproduces visual fire behavior convincingly but doesn't directly produce sound: time- stepping a 3D combustion simulation at audio sample rates is impractically expensive, and small-scale combustion noise comes from thermo-acoustics that those simulators don't resolve at typical resolutions (Chadwick & James 2011, §1).

The paper's workaround (§§2.2, 3) is a simplified analytical sound model. Combustion noise can be derived from a wave equation forced by the rate of change of the heat release rate (Crighton et al. 1992; equation 2 in the paper). Under the premixed-flame assumption (Strahle 1972), the heat release happens essentially at the moving flame front, so the volume integral over the combustion region can be rewritten as a surface integral over that front. Ignoring propagation delays, 1/r distance attenuation, and overall scaling constants (all fixed multiplicative factors at a chosen listener position), the radiated pressure reduces to p(t) = d/dt ∫_S(t) u(x, t) · n(x, t) dS (equation 6), the time derivative of the velocity flux integrated over the moving flame front. This is also proportional to the time derivative of the total heat release rate.

The flame solver runs at hundreds of steps per second (360 Hz for the paper's examples). After interpolating its pressure output up to audio sample rate, the resulting signal is band- limited to roughly the simulation's Nyquist (≈ 180 Hz): a slow-envelope rumble that captures the flame's overall rate of activity but contains essentially no audible structure above that. Real fire is broadband (audible content out to several kHz), so the simulated signal alone sounds dull rather than crackling. The job of spectral bandwidth extension (Algorithm 1, §4 of the paper) is to fill in the missing mid and high frequencies in a way that sounds plausible and stays time- locked to the simulated bursts.

Why “partially” physically based

The low-frequency input is physical: it comes from the §3 sound model derived above. The high-frequency content added here is statistical: random-phase noise whose power-spectral-density (PSD) slope and time-varying amplitude are chosen to match measured fire spectra, not derived from a wave- level model of small-scale flame turbulence. The empirical justification is that real fire spectra are well approximated by a power law P(f) ∝ f^−α over a wide band:

Clavin & Siggia (1991) derived P(f) ∝ f^−5/2 (i.e. α = 2.5) for turbulent premixed flames from theoretical considerations.
Rajaram & Lieuwen (2009) measured combustion-noise spectra directly and reported exponents in the range α ∈ [2.1, 3.4].

So the high-frequency content has the right spectral shape and (because it is gain-modulated by the simulated envelope) the right burst timing, but the individual high-frequency samples are random draws, not predictions of a finer-resolution simulation. Two flame simulations with very different small-scale combustion structure but the same envelope will sound similar at high frequency.

How the algorithm works

One noise field is built up front for the whole clip; a windowed loop then matches and blends it into the input.

Envelope extraction. Apply a zero-phase order-2 Butterworth lowpass at f_c (default 180 Hz) to the input p(t), producing p_L(t). The absolute value |p_L(t)| is the slow amplitude envelope that will gain-modulate the high-frequency noise.
Power-law noise field (built once). Construct a complex spectrum N(f) = f^−α/2 · e^iφ(f) with phases φ(f) drawn uniformly from [0, 2π). The exponent −α/2 on magnitude gives the desired −α on PSD. Multiply N(f) by the high-pass blend filter blend₂(f) to remove its low-frequency content (which would compete with the input), then inverse-FFT to get a time-domain noise signal n(t). The same n(t) is reused across all windows; this is what makes overlap-added windows blend without phase seams.
Per-window match. For each triangular analysis window w_i, compute the FFT of the windowed signal Y(f) = FFT(p · w_i) and the FFT of the envelope-shaped noise Y_N(f) = FFT(|p_L| · w_i · n). Then split: Y_L(f) = blend₁(f) Y(f) is the lowpass branch and Y_N′(f) = blend₂(f) Y_N(f) is the highpass branch. Solve for the positive scalar β that makes the Gaussian-weighted energy of Y_L + β Y_N′ equal to that of the unfiltered windowed signal Y:

∫ |Y_L(f) + β Y_N′(f)|² g(f) df = ∫ |Y(f)|² g(f) df

where g(f) is a Gaussian weighting centered on f_c with standard deviation fit_width / 3. Expanding gives a quadratic aβ² + bβ + c = 0 whose coefficients are integrals of bilinear forms in Y_L, Y_N′, and Y; the implementation evaluates each integral with the trapezoidal rule and picks the positive root via Vieta's identity.

Blend & overlap-add. Inverse-FFT the blended spectrum Y_L + β Y_N′, take its real part, and overlap-add into the running output. The triangular window with stride halfWidth satisfies the constant-overlap-add identity (sums to 1), so the in-band content is reconstructed from window to window without amplitude modulation (modulo the half-amplitude effect noted below).

What the controls do

α: the power-law PSD exponent. Larger values produce a darker high-frequency tail (more rolloff, less crackle); smaller values brighten it. Default 2.5; the physically motivated range from the cited measurements is roughly [2.0, 3.5].
cutoff f_c: the centre of the crossover. Above f_c the output increasingly comes from noise; below f_c it increasingly comes from the simulated input. Slide up if you trust your simulated signal further into the audible band; slide down to replace more of it with noise.
window half-width (samples): the analysis window covers 2×halfWidth+1 samples and the loop strides by halfWidth (50 % overlap). Smaller windows let β track fast envelope changes (more time resolution) at the cost of a coarser per-window frequency resolution.
Gaussian fit width (Hz): the full width of the matching kernel; its standard deviation is one-third of this value. A narrow kernel matches the spectra essentially at f_c only; a wider kernel averages over a broader band of the spectrum.
blend half-width (Hz): the linear ramp between blend₁ and blend₂ spans 2×blend_half_width, centred on f_c. Smaller values give a sharper crossover; larger values give a gentler one.
noise amplitude: an extra global multiplier on the noise contribution after the β match. 0 disables the high-frequency content entirely (you hear only the lowpassed input, at half amplitude); 1 is the paper's default; values above 1 push the high frequencies louder than the matching solver chose on its own.
RNG seed: integer seed for the shared PCG32 random-phase generator. The same seed produces the same phase array in both this JavaScript port and the Python reference, so the two implementations agree on the extended waveform up to FFT round-off (~10⁻⁶ on the synthetic validate.html test).
match Matlab reference (checkbox): if ticked, every per-window FFT runs at the full signal length (N_FFT = next_pow2(L)), bit-for-bit matching the published Matlab code. Off (the default) uses smaller per-window FFTs (next_pow2(8×halfWidth), typically 4096–8192) for ~100× speedup at the cost of evaluating the matching integral on a coarser frequency grid.

Things to notice

With the Synthetic burst preset, set noise amplitude = 0 and play the extended audio: you hear the in-band tones at half their original amplitude (a consequence of the algorithm's one-sided spectrum filtering; see Caveats). Set it back to 1 and the tones gain a fire-like crackle whose loudness tracks the burst envelope.
Drag α from 1.5 to 4.0 and watch the amber dashed line in the spectrum plot tilt steeper. The audible change is from crispy / hissy (small α, brighter spectrum) to dull / muffled (large α, darker spectrum).
Drag cutoff f_c from 60 Hz up to 500 Hz and watch the green “blended target” line in the spectrum plot rotate around the moving yellow cutoff line. At low cutoffs almost the entire output is noise; at high cutoffs you keep more of the simulated content.
Pick the Flame Jet example and watch the per-window β trace below the playback panel: it spikes upward during loud combustion bursts and falls back between them, exactly the behaviour that ties the synthesised high-frequency content to the simulated envelope.
Toggle match Matlab reference on and off after loading one of the bundled signals. The audible difference is usually below the noise floor; the perceptual identity is good evidence that the small-FFT approximation is safe.

Caveats and things this model is not

The Matlab implementation applies a one-sided lowpass to the two-sided FFT. The negative-frequency mirror is zeroed, so taking the real part of the IFFT halves the in-band magnitude (a 6.02 dB drop). The released code masks this by peak-normalising the output before saving WAV; this port preserves the underlying behaviour, and the Tier 3 test suite asserts the −6.02 dB drop. The Download buttons here also peak-normalise the WAV they save.
The high-frequency content is a statistical model, not a prediction of small-scale physics. Comparing two simulations' high-frequency content tells you whether their envelopes differ, not whether their flame structures differ. For research that cares about the high-end physics (e.g. analysis of turbulent flame noise), use a higher-resolution simulation or a wave-level combustion-noise model rather than this synthesis step.
The matching β is a single scalar per window. If the input has a strong feature very close to f_c in some windows and not others, β may swing sharply between adjacent windows; the per-window β trace lets you see that directly.
Defaults match the paper's example call: f_c = 180 Hz, α = 2.5, halfWidth = 500 samples (about 11.3 ms at 44.1 kHz), Gaussian fit width = 30 Hz, blend half-width = 15 Hz, noise amplitude = 1, seed = 42. The Reset button restores these.

References

Chadwick, J. N., and James, D. L. (2011). Animating Fire with Sound. ACM Transactions on Graphics (SIGGRAPH 2011), 30(4). cs.cornell.edu/projects/Sound/fire · paper PDF
Clavin, P., and Siggia, E. D. (1991). Turbulent premixed flames and sound generation. Combustion Science and Technology, 78(1–3), 147–155. (theoretical f^−5/2 derivation)
Rajaram, R., and Lieuwen, T. (2009). Acoustic radiation from turbulent premixed flames. Journal of Fluid Mechanics, 637, 357–385. (empirical exponents α ∈ [2.1, 3.4])
Zölzer, U., and Amatriain, X., eds. (2002). DAFX: Digital Audio Effects. Wiley. (general reference for frequency-domain noise synthesis and the spectral-blend approach.)

Fire Sound: Spectral Bandwidth Extension

Example signal

Algorithm parameters

Input spectrum & power-law extension

Playback

Original

Extended

Per-window β (noise scaling)

Spectrograms

Input