FilmStrip — Theory of Operation
Film audio is engineered for a specific environment — a calibrated theatrical space at 85 dB SPL — that has nothing in common with headphones, laptop speakers, or a living room couch. This page explains the full signal chain FilmStrip uses to bridge that gap, and the decades of audio engineering history behind every step.
The audio embedded in a modern Blu-ray is the product of seventy years of competing standards, corporate rivalries, and engineering breakthroughs. Understanding the format tells you a great deal about why the audio sounds the way it does — and why it needs processing to work outside a cinema.
Al Jolson's improvised ad libs in The Jazz Singer marked the end of the silent era. The technology — Warner's Vitaphone — recorded sound onto large lacquer discs synchronized to the projector. It was cumbersome and fragile, but it worked well enough to make silence commercially untenable within three years.
By the early 1930s, optical sound tracks — a photographed waveform printed along the edge of the 35mm film — had replaced disc-based systems. That optical mono track, essentially unchanged in principle, remained the universal delivery format for decades. A projector lamp, a slit, and a photocell. The frequency response was narrow, the dynamic range limited, and the whole thing was exactly one channel.
Lisztomania (1975) was the first theatrical film to use Dolby's matrix-encoded stereo format, followed by the full four-channel LCRS system on A Star Is Born (1976). The engineering trick: encode Left, Center, Right, and Surround information onto just two optical tracks on the existing 35mm print using phase relationships, then decode in the theater. No new film format required.
Star Wars (1977) is the landmark most people cite — and for good reason. The combination of Ben Burtt's sound design and Dolby's format made the theater experience radically different from anything that came before. But the audio was still matrix-encoded, meaning the channels were mathematically derived from two tracks rather than discrete. The surround channel, in particular, was mono and limited in bandwidth.
Batman Returns (1992) was the first wide theatrical release with Dolby Digital, also known by its codec name AC-3. The six discrete audio channels — Left, Right, Center, Left Surround, Right Surround, and a dedicated subwoofer channel designated LFE (Low Frequency Effects) — were encoded digitally and printed between the sprocket holes of the 35mm film. Literally: the digital data occupied the space between the perforations.
The "5.1" naming convention comes from this: five full-range channels plus one band-limited LFE channel. The .1 channel is only 120 Hz wide and exists specifically for bass effects — explosions, impacts, rumbles — where extreme SPL is needed but directional information is irrelevant.
Steven Spielberg was among the early investors in Digital Theater Systems and deliberately chose DTS for Jurassic Park (1993) as its commercial debut. Unlike Dolby Digital — which encoded audio directly onto the film — DTS stored the audio on a separate CD-ROM that theaters received alongside the print. A timecode track on the optical strip kept the two synchronized. The separate disc allowed significantly higher bit rates and, arguably, better audio quality for the era.
Sony's SDDS launched the same year on In the Line of Fire and Last Action Hero. SDDS used a different approach entirely: compressed digital audio on both outer edges of the 35mm print, supporting up to eight discrete channels — five across the screen, two surrounds, and a subwoofer.
For roughly a decade, major studio prints carried all three digital formats simultaneously: Dolby Digital between the sprocket holes, a DTS timecode stripe, and SDDS data on the outer edges — along with the original analog optical track as a fallback. A projector would try each format in priority order and drop back to the next if it failed. It was redundancy by industry committee, and it worked.
Blu-ray (and briefly HD DVD) introduced lossless audio to home video. Dolby TrueHD and DTS-HD Master Audio are both mathematically lossless — they produce bit-for-bit identical output to the studio master. Previous home formats (DVD's Dolby Digital and DTS) were lossy, discarding audio information to fit in the available data budget.
This matters for FilmStrip: a Blu-ray rip in MKV format will typically contain a TrueHD or DTS-HD Master Audio track at the source sample rate and channel count of the original master. That's usually 48 kHz and 7.1 channels — eight discrete channels at studio quality, compressed losslessly. When FilmStrip extracts and downmixes this to 44.1 kHz stereo WAV, it's working with the best possible source material.
Pixar's Brave (June 2012) was the first feature mixed entirely for Dolby Atmos, premiering at the Dolby Theatre in Los Angeles. Atmos represents a fundamental shift in how cinema audio is authored: rather than assigning sounds to fixed channels, a mixer positions sounds as objects in three-dimensional space with X/Y/Z coordinates and metadata. The theatrical renderer interprets those coordinates and routes audio to whatever speaker configuration the specific room has — up to 64 channels.
On Blu-ray, Atmos is delivered as a TrueHD track with an additional metadata layer for height information. A compatible receiver unpacks the objects; an incompatible player simply decodes the core TrueHD 7.1 mix. For FilmStrip's purposes, an Atmos track is a TrueHD track at 48 kHz, and the downmix is handled identically.
Film audio is mixed in a room calibrated to a specific playback level. When you take that mix out of its intended environment, the dynamic range that was designed to create dramatic impact becomes an obstacle to simple comprehension.
SMPTE RP200, the standard for theatrical sound calibration, specifies that pink noise at −20 dBFS should measure 85 dB SPL (C-weighted, slow) at the primary listening position — every channel, independently calibrated to the same level. This is the reference level at which theatrical mixers do their work.
With 20 dB of headroom above that reference, a theatrical system can produce 105 dB peak SPL per channel — louder than a rock concert, louder than a jet at 300 feet. The dynamic range between a whispered line of dialogue and a full-scale action sequence is typically 20–25 dB. By design.
Home theater receivers are typically calibrated to 79–82 dB SPL at the listening position — 3 to 6 dB below theatrical reference. Headphone and laptop listening happens at 70–78 dB or lower. That gap isn't just about volume: the human ear's frequency and dynamic sensitivity changes with absolute playback level (the Fletcher-Munson effect), which means the perceived dynamic range shrinks as you turn down.
At 78 dB, a scene's dynamic range effectively compresses by about 7 dB. What read as a whisper and a shout in the theater becomes a murmur and a raised voice at home. The problem compounds with ambient noise — traffic, HVAC, household activity — masking the quiet end of the range entirely.
Film dialogue lives in the center channel, which in a 5.1 or 7.1 mix carries the primary voice frequencies (roughly 200 Hz – 4 kHz). Small speakers, earbuds, and laptop transducers have poor response in this midrange range. Music and effects tend to occupy lower and higher frequencies where small speakers often perform better — or at least differently — making dialogue feel recessed even when it's not.
The theatrical mix assumes you can hear at −31 LUFS because your room is quiet, your speakers are large, and you're calibrated. None of those assumptions hold on headphones during your commute.
A typical theatrical film integrates around −27 LUFS, which is Netflix's delivery standard. Spotify normalizes music to −14 LUFS. Apple Music targets −16 LUFS. That difference — up to 13 LUFS between a film mix and a streaming pop track — represents a factor of roughly 4.5× in average loudness. The film is simply much quieter by design, intended for a much louder room.
Loudness normalization in FilmStrip bridges this gap: it brings the integrated loudness of the extracted audio up to a target that matches your listening context.
Extracting a 7.1 audio track from an MKV and rendering it as stereo requires folding eight channels into two. The math is defined by international standards and is not lossless in any perceptual sense — but it is principled.
A 7.1 mix contains eight discrete channels arranged around the listener. Each has a specific intended acoustic position and a specific role in the mix:
| Channel | Abbreviation | Position | Primary Content |
|---|---|---|---|
| Front Left | FL | Front-left speaker | Music (left), sound effects, ambience |
| Front Right | FR | Front-right speaker | Music (right), sound effects, ambience |
| Center | FC | Center screen speaker | Dialogue — almost exclusively |
| LFE | LFE | Subwoofer (any location) | Bass effects: explosions, impacts (≤120 Hz) |
| Back Left | BL | Rear-left speaker | Ambient surround, rear effects |
| Back Right | BR | Rear-right speaker | Ambient surround, rear effects |
| Side Left | SL | Side-left speaker | Wide surround, room tone, passing sounds |
| Side Right | SR | Side-right speaker | Wide surround, room tone, passing sounds |
FilmStrip uses a LoRo (Left-Only/Right-Only) downmix, the standard defined by ITU-R BS.775 and ATSC A/52. It is called "Left-Only/Right-Only" because the output channels are independently derived — there is no phase encoding between them. The result plays correctly on any stereo system without special decoding.
The alternative is LtRt (Left Total/Right Total), which uses phase manipulation to encode surround information in a way that a Dolby Pro Logic decoder can later extract. LtRt produces a wider apparent soundstage on a compatible system but degrades on a simple stereo playback. For a file you intend to listen to on headphones or a stereo player, LoRo is correct.
Implementation note: FilmStrip passes aformat=channel_layouts=stereo to FFmpeg, which applies a standard stereo downmix consistent with the LoRo convention. The coefficients shown below reflect that convention; the exact internal values are determined by FFmpeg's downmix implementation.
The 0.707 coefficient (−3.01 dB) for the center channel is a compromise. If the center were mixed at full gain (1.0), and a listener happened to also have equal content in the front-left channel, the sum at the left output would be +3 dB above either source alone — loud and spatially confused. If mixed at 0.5 (−6 dB), dialogue becomes noticeably recessed.
The −3 dB value is the equal-power sum point: when two equal-amplitude signals are summed, the resulting power is twice either source (i.e., +3 dB), so each source contributes half that power at −3 dB. This keeps the front soundstage balanced while keeping the center channel audible and centered.
The side channels get an additional −3 dB reduction (0.5, or −6 dB total) because they carry primarily ambient content — room tone, diffuse reverb, passing sounds. Folding them in at full surround level would fill the stereo image with diffuse content and reduce its clarity. The reduced gain preserves a sense of space without congesting the mix.
Why LFE is dropped: The LFE channel is band-limited to approximately 120 Hz and carries content specifically mixed for a subwoofer running at cinema reference levels. Adding it to the stereo output at any significant gain would cause uncontrolled low-frequency energy on systems that cannot handle it — small speakers, earbuds, laptop transducers — risking distortion or damage. ITU explicitly specifies that LFE should not be included in a stereo downmix. The full-range channels already contain the low-frequency content of the music and ambient effects; only the extreme bass effects are lost, and they aren't reproducible on typical listening hardware anyway.
After the downmix:
The result is a serviceable, intelligible stereo mix. It won't replicate the spatial experience of the theatrical presentation, but it prioritizes the most important element — dialogue — and preserves the music and primary effects in a form that translates well to headphones.
Each step runs in sequence, writing intermediate results to a temporary directory. The source file is never modified.
Dashed steps are optional. Dialog Guard (5.1/7.1 sources only) normalizes the center channel before the downmix, in the same ffmpeg pass. Level Riding runs on the full multichannel signal before the stereo downmix. Decode, optional level riding, resample, downmix, and peak limiting all run in a single ffmpeg pass (pass 1), producing a temporary WAV. Loudness analysis and normalization are a separate two-pass EBU R128 process that only runs if enabled.
ffmpeg decodes the selected audio stream. If Dialog Guard is enabled on a 5.1 or 7.1 source, it first splits out the center channel, normalizes it independently, and reassembles the multichannel signal before the downmix. If level riding is enabled, dynaudnorm runs next on the full multichannel signal. The filter chain then resamples to 44.1 kHz, downmixes to stereo via aformat=channel_layouts=stereo, and runs a brick-wall limiter (alimiter=limit=0.99:attack=5:release=50:level=false) to catch any inter-sample peaks from channel summation. The result is a temporary 24-bit PCM WAV file.
24-bit depth is chosen rather than 16-bit to preserve headroom for subsequent processing steps. Loudness normalization may apply a gain of several dB; 24 bits provides 144 dB of dynamic range so there is no meaningful resolution loss at any gain setting.
For 5.1 and 7.1 sources, Dialog Guard targets the center channel (FC — index 2 in both layouts) before the stereo downmix. It uses ffmpeg's channelsplit to separate channels, applies dynaudnorm=p=0.88:m=5:g=15 to the center channel alone, then reassembles with amerge. The smaller window (g=15 ≈ 0.5 s vs. level riding's g=31 ≈ 1.5 s) reacts quickly to brief quiet passages. The maximum gain of 5× (~14 dB) is enough to rescue quiet dialog without over-amplifying. Has no effect on stereo or mono sources.
When enabled, dynaudnorm runs on the full multichannel signal as part of Pass 1 — after Dialog Guard (if active) and before the stereo downmix. The filter analyzes audio in short frames using a Gaussian-weighted sliding window, computes per-frame gain, and applies it with smooth transitions. Using both Dialog Guard and Level Riding together is intentional: Dialog Guard ensures the center channel enters the downmix at a consistent level; Level Riding then evens out the final stereo mix.
The loudness normalization process requires two ffmpeg passes. The first pass runs ffmpeg with the loudnorm filter in analysis mode, outputting its measurements to /dev/null — no audio is written. At the end of the pass, the filter prints a JSON block to stderr containing the measured integrated loudness (LUFS), true peak (dBTP), loudness range (LRA), and threshold values.
This analysis runs against the already-downmixed and (optionally) level-ridden WAV from steps 1–5 — so the loudness measurement reflects the final dynamics of the audio, not the original theatrical mix.
The second pass reads the measured values from Pass 1 and applies a linear gain to bring the integrated loudness to the target LUFS. With linear=true, the normalization is a single static gain value — the entire file is scaled up or down by the same amount, with no dynamic processing. This preserves the dynamic character of the audio (including the level riding, if it was applied) while adjusting where that dynamic range sits relative to full scale.
The output of this pass is the final WAV file. For M4A export, that WAV is then encoded with AAC at the selected bitrate as a separate final step.
Why level riding before loudness normalization? The order is intentional. Level riding reduces the dynamic range of the audio. Loudness normalization then measures the resulting signal and sets the overall level. If the order were reversed — normalize first, then ride levels — the final output level would be unpredictable, because dynaudnorm's gain changes would shift the integrated loudness away from the target. The current order guarantees the output hits the target LUFS regardless of what level riding does to the dynamics.
Before automated systems existed, a broadcast engineer sat at a console and manually rode the fader — turning down peaks, pulling up quiet passages — to keep audio levels consistent for home listeners. The goal was the same as it is today: tame the dynamic range of content mixed for a different context. dynaudnorm is a modern, principled version of that process.
The filter divides the audio into short frames — 500 milliseconds by default. For each frame, it computes the peak magnitude. This is not RMS (average energy) but true peak — the highest instantaneous sample level in the frame. The gain needed to bring that peak to the target level p is calculated: gain = p / peak.
If the result exceeds the maximum allowed gain m, the gain is clamped to m. This prevents very quiet frames (near-silence, room noise) from being boosted so aggressively that the noise floor becomes intrusive.
A raw per-frame gain would cause audible pumping — volume jumping every half-second in sync with the analysis window. dynaudnorm prevents this by computing a Gaussian-weighted average across a neighborhood of frames (the window size g) and using the smoothed gain value instead of the raw one.
The Gaussian weighting means frames near the current position in time have more influence than frames far away. With the default window of 31 frames (about 15.5 seconds of lookahead and lookbehind), gain transitions are spread over several seconds — imperceptible as discrete steps. The filter effectively "knows" what's coming and begins adjusting early.
p ParameterThe peak target controls how aggressively loud frames are attenuated. A value of p = 0.95 means the loudest frame in each window will be brought to 95% of full scale — very gentle, leaving most of the original dynamic range intact. At p = 0.55, loud frames are pulled down to 55% of full scale — a reduction of about −5.2 dB — noticeably more aggressive.
Lower p values push down loud peaks more, narrowing the distance between the loudest and quietest passages in the processed audio. Combined with higher m, this is what produces the leveling effect.
m ParameterThis is the parameter that actually closes dynamic range rather than just attenuating peaks. m sets the maximum gain that can be applied to a frame. At m = 1.0 (the original FilmStrip default), gain can never exceed 1.0 — the filter can only attenuate loud frames, never boost quiet ones. Quiet dialogue stays quiet.
At m = 10.0, the filter can amplify a quiet frame by up to 10× (roughly +20 dB). Scenes where a character whispers, or where the only content is ambient room tone, get boosted significantly — narrowing the perceived gap between quiet and loud. This is the effective dynamic range compression that makes film audio comfortable to listen to without a home theater.
Why the original m = 1.0 caused the problem: With only downward normalization, dynaudnorm behaves like a soft limiter — it reduces peaks that exceed the target but does nothing for frames below it. A scene with dialogue at −30 dBFS would remain at −30 dBFS even at the highest aggressiveness setting, because the gain for that frame would be 1.82× — above the m = 1.0 ceiling. The loudest frames got quieter; the quietest frames didn't change; the perceived dynamic range was barely reduced. Raising m unlocks the upward half of the leveling process.
The aggressiveness slider maps linearly to the p and m parameters. The Gaussian window size g is fixed at 31 frames across all settings.
| Level | p (peak target) | m (max gain) | Max attenuation | Max boost | Character |
|---|---|---|---|---|---|
| 1 | 0.95 | 2.0× | −0.4 dB | +6.0 dB | Barely perceptible. Takes the sharpest edges off very loud peaks. |
| 2 | 0.91 | 2.9× | −0.9 dB | +9.2 dB | Gentle. Loud transients roll off; quiet passages begin to come up. |
| 3 | 0.86 | 3.8× | −1.3 dB | +11.6 dB | Light leveling. Noticeable on wide-range content. |
| 4 | 0.82 | 4.7× | −1.7 dB | +13.4 dB | Moderate. Action-dialogue gap starts to close. |
| 5 | 0.77 | 5.6× | −2.3 dB | +14.9 dB | Medium. Clear leveling; content sounds more consistent. |
| 6 | 0.73 | 6.4× | −2.8 dB | +16.1 dB | Assertive. Dynamic range significantly compressed. |
| 7 | 0.68 | 7.3× | −3.4 dB | +17.3 dB | Heavy. Quiet scenes boosted substantially; peaks pulled down hard. |
| 8 | 0.64 | 8.2× | −3.9 dB | +18.3 dB | Aggressive. Noticeable on most content; ambient noise floor rises. |
| 9 | 0.59 | 9.1× | −4.6 dB | +19.2 dB | Very aggressive. Quiet dialogue and action scenes approach similar level. |
| 10 | 0.55 | 10.0× | −5.2 dB | +20.0 dB | Maximum. Up to 20 dB of upward gain; dynamic range heavily compressed. |
At level 10, the gain window is 31 frames × 500 ms = 15.5 s of smoothing. Transitions are slow and unlikely to cause audible pumping on typical film content.
Film dialog lives almost exclusively in the center channel — a deliberate mixing decision that dates to the earliest days of surround sound. When a 5.1 or 7.1 mix is downmixed to stereo, the center channel is attenuated by −3 dB before being summed into both output channels. If a scene's dialog was already mixed quietly relative to the action, that attenuation can push it below the threshold of comfortable comprehension. Dialog Guard addresses this at the source, before the downmix happens.
In any standard 5.1 or 7.1 theatrical mix, the center channel (FC) carries dialog with essentially no other content. Explosions are in the LFE. Music is spread across FL and FR. Ambience and effects fill BL, BR, SL, and SR. The center is reserved for voices — and in the channel layout for both 5.1 and 7.1, FC is always index 2, making it unambiguously targetable regardless of the specific surround configuration.
Full-mix normalization tools like dynaudnorm with a wide window can miss brief quiet passages — a character whispering for a few seconds in an otherwise loud scene, for instance. The window average is dominated by the loud content surrounding the passage, so the gain assigned to that window remains low. The whisper stays quiet. Targeting the center channel independently, with a shorter window, allows these brief dips to be caught and corrected before they get buried further by the downmix.
When Dialog Guard is enabled on a 5.1 or 7.1 source, the entire extraction pass switches from a simple -af filter chain to a -filter_complex graph that processes channels as parallel streams:
The reassembled multichannel signal then continues through the normal filter chain: optional level riding, resample to 44.1 kHz, stereo downmix, and peak limiting. The center channel enters the downmix already normalized — the −3 dB attenuation in the downmix matrix is applied to a signal that has already been leveled, rather than to one that may have been quietly mixed to begin with.
The Gaussian window of 15 frames covers approximately 0.5 seconds — half the default 31-frame window used by level riding. This shorter window allows the filter to react to brief quiet passages that a wider window would smooth over. The tradeoff is slightly more audible gain transitions, but on a mono channel carrying dialog the effect is rarely perceptible.
The maximum gain of 5× (~14 dB) is applied only when the center channel's peak is very low — near-silence or extremely quiet dialog. In practice, the filter boosts passages by a few dB in most cases. The ceiling prevents the gain from getting large enough to turn ambient room tone or background noise on the center channel into audible hiss during pauses in dialog.
Dialog Guard only activates when the source track has exactly 6 or 8 channels. For stereo sources — films that were distributed as pre-downmixed stereo, or tracks that have already been processed — there is no dedicated center channel to target and the setting has no effect. The processing path falls back to the standard -af filter chain.
Using Dialog Guard alongside Level Riding is the intended combination. Dialog Guard normalizes the center channel (dialog) specifically before the downmix, at a fast window optimized for short passages. Level Riding then normalizes the full stereo mix after the downmix, at a slower window that smooths transitions across longer scenes. They operate at different stages, on different signals, for different purposes.
For most of the digital audio era, engineers chased loudness by bricking signals against 0 dBFS with limiters. The result was audio that was measurably distorted, perceptually fatiguing, and impossible to compare across sources. EBU R128 changed the unit of measurement from peak level to perceived loudness — and eventually ended the war.
The problem began with CD mastering in the 1990s. Peak normalization ensured that no sample exceeded 0 dBFS, and for a brief period, that was sufficient. Then labels began competing for shelf presence: a song that sounded louder in a record store — even by a few dB — was perceived as more powerful, more professional, more worth buying.
Engineers responded by compressing dynamic range and limiting peaks with increasing aggression, allowing average levels to rise without exceeding 0 dBFS. A 1983 recording normalized to −14 dBFS peak might have an average level of −25 dBFS. By 2008, some releases averaged −9 dBFS — saturated, clipping, often visibly distorted in a waveform editor. Metallica's Death Magnetic (2008) became the canonical example of extreme loudness war damage: the album version famously measured worse on dynamic range tests than the Guitar Hero video game version of the same recording, which had been remixed under different constraints.
The problem was the measurement. Peak level tells you nothing about how loud something sounds to a human listener. Two signals can share the same peak level and differ by 10 dB in perceived loudness. A new measurement standard was needed.
The International Telecommunication Union published BS.1770 as a method for measuring the subjective loudness of audio programs. The core of the algorithm is K-weighting — a frequency filter that approximates how the human auditory system responds to sound at normal listening levels.
K-weighting applies two stages of filtering before the RMS measurement:
After K-weighting, the algorithm measures mean square energy across overlapping 400 ms gating blocks, applies a two-stage threshold to exclude silence and near-silence, and reports the result in LUFS — Loudness Units relative to Full Scale. LUFS and LKFS (the ITU acronym: Loudness, K-weighted, relative to Full Scale) are identical measurements; the naming convention differs between the ITU and EBU standards but the values are interchangeable.
The European Broadcasting Union's Recommendation R128, first published in 2010, applied the ITU-R BS.1770 measurement algorithm to broadcast and mandated a target of −23 LUFS integrated loudness for all content delivered to European broadcasters. For the first time, a regulatory body had replaced peak normalization with loudness normalization as the delivery standard.
The immediate effect was that a heavily compressed pop record and a lightly processed spoken-word program, if both mastered to −23 LUFS, would play at the same perceived loudness on broadcast. The incentive to compress was removed: you could not get louder on air than a competitor by bricking your track harder, because the broadcaster's normalizer would bring both to the same level.
Streaming platforms followed. Spotify, Apple Music, YouTube, and others all adopted loudness normalization with targets ranging from −14 to −23 LUFS depending on their playback philosophy. The loudness war, at least for the formats that carry normalization metadata, was over.
Accurate loudness normalization requires knowing the measured loudness before applying the correction — you cannot normalize in a single real-time pass without reading the entire file first. FilmStrip uses ffmpeg's loudnorm filter in two-pass mode:
ffmpeg processes the full audio file with the loudnorm filter in measurement mode, writing output to /dev/null. At the end, the filter reports the measured integrated loudness (I), true peak (TP), loudness range (LRA), and gating threshold.
For a two-hour film this pass takes time proportional to the duration — it is decoding and filtering the full audio at real time or faster. The measured values are captured from ffmpeg's stderr output and passed to the second pass.
The second pass provides the measured values back to the loudnorm filter along with the target LUFS. With linear=true, the filter computes a single static gain value: the difference between the measured integrated loudness and the target, accounting for true peak ceiling. That gain is applied uniformly across the entire file.
Linear mode preserves dynamics exactly. The output has the same waveform shape as the input — only louder or quieter by a fixed amount. This is ideal after level riding, which has already done the dynamic work.
The LUFS target determines how loud the output sits relative to other audio on your device. Lower numbers (more negative) are quieter; higher numbers (less negative) are louder. The right value depends on your listening context:
| Target | Context | Notes |
|---|---|---|
| −23 LUFS | Broadcast standard (EBU R128) | Matches European broadcast TV. Very conservative — feels quiet next to music. |
| −20 LUFS | Conservative film / podcast | Good for content that will sit alongside other film audio. |
| −18 LUFS | Conservative | Louder than broadcast, quieter than streaming music. Good for content that will sit alongside other film audio. |
| −16 LUFS | FilmStrip default (Apple Music / Tidal) | Matches streaming music normalization on Apple platforms. Dialogue feels natural in a playlist context. Works well on most devices. |
| −14 LUFS | Spotify standard | Loudest major streaming target. Well-suited for listening alongside music. |
If the measured loudness is already below the true-peak ceiling (−1 dBTP by default), the normalization is a simple gain increase. If the source peaks would clip at the target loudness, the gain is reduced slightly to keep the output within the ceiling — so the integrated loudness may land slightly below target on very dynamic material. This is correct behavior: it prevents digital clipping at the cost of fractionally missing the loudness target.
A practical guide to every setting in FilmStrip and its effect on the output audio.
Uncompressed linear PCM. Every sample is stored exactly as computed — no encoding loss. 24 bits provides 144 dB of theoretical dynamic range. 44.1 kHz matches the CD standard and is the most common sample rate for music playback and audio editing.
File size: approximately 15 MB per minute at these settings. A two-hour film produces roughly 1.8 GB.
Advanced Audio Coding at 128, 192, or 256 kbps. A lossy format: audio information is discarded to achieve compression. The psychoacoustic model discard components that are perceptually masked — inaudible in the presence of other sounds. At 256 kbps, the artifacts are essentially inaudible on normal program material.
File size: approximately 2 MB/min (128 kbps) to 3.8 MB/min (256 kbps). A two-hour film is 240–460 MB depending on bitrate.
| Setting | Parameter | Effect |
|---|---|---|
| Enabled | filter_complex + dynaudnorm on FC | For 5.1 and 7.1 source tracks, normalizes the center channel independently before the stereo downmix using a fast-reacting dynaudnorm (p=0.88:m=5:g=15). Silently skipped for stereo and mono sources — no effect on the output. Runs in the same ffmpeg pass as extraction with no additional processing time. |
| Setting | Parameter | Effect |
|---|---|---|
| Enabled | dynaudnorm filter | Adds the dynamic normalization filter to the extraction pass. Runs after Dialog Guard (if active) and before loudness normalization. Processing time is unchanged — it runs in the same pass as the extraction. |
| Aggressiveness 1–10 | p: 0.95→0.55, m: 2.0→10.0 | Controls both peak target and maximum gain simultaneously. Lower settings gently tame peaks. Higher settings both attenuate peaks and lift quiet passages, significantly reducing perceived dynamic range. See the full table in Section 05. |
| Setting | Parameter | Effect |
|---|---|---|
| Enabled | loudnorm two-pass | Adds two ffmpeg passes after extraction. Pass 1 analyzes the full file (takes additional time proportional to duration). Pass 2 applies a linear gain. Total processing time approximately doubles for a full-length film. |
| Target: −23 LUFS | I=-23 | Broadcast standard. Quieter than streaming music; appropriate if the output will play alongside other film audio or in a calibrated home theater context. |
| Target: −18 LUFS | I=-18 | Default. A reasonable middle ground — louder than broadcast, quieter than Spotify. The output sits comfortably on most devices without feeling compressed relative to other audio. |
| Target: −14 LUFS | I=-14 | Matches Spotify's normalization target. If you listen to music on Spotify with normalization enabled, setting FilmStrip to −14 LUFS will make film audio play at the same perceived level as your music library. |
Default settings: Dialog Guard, level riding at aggressiveness 7, and loudness normalization at −16 LUFS are all enabled by default. This combination works well for headphone listening across virtually all films — dialog stays intelligible, dynamic range is reduced enough to follow without reaching for the volume knob, and the output sits at the same perceived level as streaming music. Adjust aggressiveness down to 5–6 if you want to preserve more cinematic dynamics, or up to 8–9 for unusually wide-range films like Dunkirk or 1917.
Not all audio codecs are equal inputs for FilmStrip. When you have a choice of which file to download, the codec embedded in the source determines whether processing introduces an extra lossy transcode or not. The four you'll encounter most often on Blu-ray rips are AAC, E-AC3, AC3, and DTS.
AAC is the same codec FilmStrip produces for M4A output. If you're exporting to M4A, selecting an AAC source track means no transcode at all for that format — the file stays within the AAC family the entire time. WAV output decodes and re-encodes regardless of source, so the advantage is most significant for M4A. When an AAC track is available, always prefer it.
Dolby Digital Plus (E-AC3) is the dominant format on modern Blu-ray rips. It supports up to 1.5 Mbps and carries full 7.1 surround. Quality is high at typical encode bitrates. The decode → process → re-encode chain introduces a single generation of lossy conversion, which is audibly transparent at 192+ kbps M4A output. E-AC3 at 640 kbps is an excellent source.
Dolby Digital (AC3) is the older 5.1 format with a ceiling of 640 kbps. It's perceptually lossier than E-AC3 at comparable bitrates, but in practice most AC3 tracks at 448–640 kbps sound fine after extraction. The extra generation loss is the same as E-AC3 — one decode and one re-encode. If AC3 is the only option, it will work well.
DTS-HD Master Audio carries lossless audio on Blu-ray, but MKV rips typically demux to lossy DTS Core at 1.5 Mbps. The decode → process → re-encode chain is identical to E-AC3 — FilmStrip produces the same output quality regardless. There is no reason to seek out DTS specifically; E-AC3 or AAC are better or equal in every case.
Practical note: When a file contains both AAC and E-AC3 tracks (common in remuxes that preserve all streams), select the AAC track in FilmStrip. You'll skip one decode/re-encode cycle for M4A output and get the same WAV quality for no extra cost.