Automatically trim silence from video with ffmpeg and python

Odysee YouTube


I finally did it, I managed to figure out a little process to automatically remove the silent parts from a video.

Let me show ya'll the process and the two main scripts I use to accomplish this.

Process

  1. Use ffmpeg's silencedetect filter to generate output of sections of the video's audio with silence
  2. Pipe that output through a few programs to get the output in the format that I want
  3. Save the output into a text file
  4. Use that text file in a python script that sections out the parts of the video with audio, and save the new version with the silence removed

Now, with the process laid out, lets look at the scripts doing the heavy lifting.

Scripts

Here is the script for generating the silence timestamp data:

#!/usr/bin/env sh

IN=$1
THRESH=$2
DURATION=$3

ffmpeg -hide_banner -vn -i $IN -af "silencedetect=n=${THRESH}dB:d=${DURATION}" -f null - 2>&1 | grep "silence_end" | awk '{print $5 " " $8}' > silence.txt

I'm passing in three arguments to this script: * IN – the file path to the video I want to analyze

That leaves us with the actual ffmpeg command:

ffmpeg -hidebanner -vn -i $IN -af “silencedetect=n=${THRESH}dB:d=${DURATION}” -f null – 2>&1 | grep “silenceend” | awk '{print $5 “ ” $8}' > silence.txt

The output of silencedetect looks like this: Silencedetect Example Output

The final output looks like this:

86.7141 5.29422
108.398 5.57798
135.61 1.0805
165.077 1.06485
251.877 1.11594
283.377 5.21286
350.709 1.12472
362.749 1.24295
419.726 4.42077
467.997 5.4622
476.31 1.02338
546.918 1.35986

You might ask, why did I not grab the silence start timestamp? That is because those two numbers I grabbed were the ending timestamp and the duration. If I just subtract the duration from the ending timestamp, I get the starting timestamp!

So finally we get to the python script that processes the timestamps. The script makes use of a python library called moviepy, you should check it out!

#!/usr/bin/env python

import sys
import subprocess
import os
import shutil
from moviepy.editor import VideoFileClip, concatenate_videoclips

# Input file path
file_in = sys.argv[1]
# Output file path
file_out = sys.argv[2]
# Silence timestamps
silence_file = sys.argv[3]

# Ease in duration between cuts
try:
    ease = float(sys.argv[4])
except IndexError:
    ease = 0.0

minimum_duration = 1.0

def main():
    # number of clips generated
    count = 0
    # start of next clip
    last = 0

    in_handle = open(silence_file, "r", errors='replace')
    video = VideoFileClip(file_in)
    full_duration = video.duration
    clips = []
    while True:
        line = in_handle.readline()

        if not line:
            break

        end,duration = line.strip().split()

        to = float(end) - float(duration)

        start = float(last)
        clip_duration = float(to) - start
        # Clips less than one seconds don't seem to work
        print("Clip Duration: {} seconds".format(clip_duration))

        if clip_duration < minimum_duration:
            continue

        if full_duration - to < minimum_duration:
            continue

        if start > ease:
            start -= ease

        print("Clip {} (Start: {}, End: {})".format(count, start, to))
        clip = video.subclip(start, to)
        clips.append(clip)
        last = end
        count += 1

    if full_duration - float(last) > minimum_duration:
        print("Clip {} (Start: {}, End: {})".format(count, last, 'EOF'))
        clips.append(video.subclip(float(last)-ease))

    processed_video = concatenate_videoclips(clips)
    processed_video.write_videofile(
        file_out,
        fps=60,
        preset='ultrafast',
        codec='libx264'
    )

    in_handle.close()
    video.close()

main()

Here I pass in 3 required and 1 optional argument:

You will see there is a minimum_duration, that is because I found in testing that moviepy will crash when trying to write out a clip that is less than a second. There are a few sanity checks using that to determine if a clip should be extracted yet or not. That part is very rough still though.

I track when the next clip to be written out should start in the last variable, to track when the last section of silence ended.

The logic for writing out clips works like so:

Last we write the remainder of the video to the last clip, use the concatenate_vidoeclips function from moviepy to pass in a list of clips and combine them into one video clip, and call the write_videofile method of VideoClip class to save the final output to the out path I passed into the script.

Tada! You got a new version of the video with the silent parts removed!

I will try to show a before and after video of the process soon.

#ffmpeg #python #videoediting