Skip to content

Silence detection never fires across multiple recording setups #45

@danmaby

Description

@danmaby

Hi there, great tool! I've been testing Jivetalking with a few different mic setups, and silence detection consistently reports "No silence detected" regardless of recording conditions. The derived noise floor always comes back as -Inf dBFS.

Setup

  • Rode PodMic through a Rodecaster Pro (tested with built-in processing both on and off)
  • MacBook built-in microphone recording directly into Audacity
  • All recordings exported as 48kHz mono FLAC
  • v0.2.2 on macOS (Apple Silicon)

What I tried

I ran about ten tests across both setups, each starting with 15-20 seconds of intentional silence (confirmed flat in Audacity's waveform view). Each one returned "No silence detected" and a noise floor of -Inf dBFS. Here's a typical result:

SILENCE DETECTION
  Threshold:      -81.5 dB (from -82.5 dB noise floor estimate)
  No silence detected

DERIVED MEASUREMENTS
  Noise Floor:    -Inf dBFS (from astats)

Following the README's advice to "start each recording with 10-15 seconds of silence" made no difference across any of the tests.

What I found

I dug into the code to try to understand what was going on. I'm not deeply familiar with the signal analysis internals, so I may have some of this wrong, but I was able to reproduce and test each step. There seem to be three things interacting.

1. Silence at the start of a recording is excluded by design

excludeFirstSeconds = 15.0 in analyzer.go rejects any silence candidate whose start time falls within the first 15 seconds. The comment says this skips "preamble before intentional room tone recording", which assumes people speak first and then go quiet. But the README tells you to do the opposite, start with silence. So if you follow the documented workflow, the silence region starts at t=0 and the entire candidate is discarded, even if it extends well past the 15-second mark. The exclusion checks the region's start time, not its contents.

I confirmed this by temporarily setting excludeFirstSeconds = 0.0 and rebuilding. A 4-minute recording with 16 seconds of silence at the start went from "No silence detected" to finding a speech region with 93% voicin, so the downstream detection clearly benefits when the exclusion is removed.

2. The search window is too small for shorter recordings

silenceSearchPercent = 15 limits the silence search to the first 15% of the recording. On a 1-minute test recording that works out to about 9.5 seconds, which falls entirely within the 15-second exclusion zone from issue 1. Even after moving the silence to 20 seconds into the recording, it was still outside the search window. This only stops being a problem with recordings of several minutes or longer. I appreciate that podcast recordings are generally longer than a minute. I wanted to highlight this issue, though.

3. Boundary transients between speech and silence cause rejection

When I worked around issues 1 and 2 by making a 4-minute recording with 20 seconds of speech followed by 15 seconds of silence, a candidate was finally found at 19.5s, but it scored 0.000 and was rejected:

SILENCE DETECTION
  Threshold:      -45.0 dB (from -46.0 dB noise floor estimate)
  Candidates:     1 evaluated

  #1: 16.6s at 19.5s
      Score:       0.000
      RMS Level:   -77.6 dBFS
      Peak Level:  -12.9 dBFS
      Crest:       64.7 dB

The RMS of -77.6 dBFS looks like genuine silence, but a single loud peak at -12.9 dBFS (probably a transient at the speech-to-silence boundary) pushes the crest factor to 64.7 dB. This triggers the isLikelyCrosstalk check (threshold 45 dB), which rejects the candidate before the scoring logic runs. The golden sub-region refinement that could potentially trim away the boundary transient only runs after scoring, so it never gets the chance.

Summary

Following the README's recording instructions (silence at the start), silence detection cannot succeed because the exclusion window discards it. Working around that by placing silence later in the recording runs into the search window limit on shorter recordings, and boundary transients on longer ones. The net result is that no silence candidate survives to produce a noise profile across any of the setups I tested.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions