I want to combine a few clips together, with a 5-seconds of intro text on each one.

Create overlay text in GIMP

I created some overlay text in gimp, then exported to png files. An example (Note the transparancy, and drop shadow):

sample overlay png

Trim clips to length

Using the methods I've described in previous ffmpeg posts, I trimmed the clips, ensuring that there is at least five seconds of lead-in on each clip for the text.

ffmpeg -i 1-Tire-Squeal-Front.MOV -ss 1:01 -to 1:24 -c copy 1-Tire-Squeal-Front.trim.MOV

Overlay text on video clips

Then I overlayed the PNG on top of the video for five seconds, thanks to Google leading me to mark4o on Super User:

ffmpeg -i 1-Tire-Squeal-Front.trim.MOV -loop 1 -i 1-Tire-Squeal-Front.png -filter_complex "[1:v] fade=out:st=5:d=1:alpha=1 [ov]; [0:v][ov] overlay=0:0 [v]" -map "[v]" -map 0:a -codec:v libx264 -crf 21 -bf 2 -flags +cgop -pix_fmt yuv420p -codec:a aac -strict -2 -b:a 384k -r:a 48000 -movflags faststart -to 0:23 1-Tire-Squeal-Front.title.mp4

I basically just used his example, except I changed the duration, and offset since my title png is already 1920x1080. Also, all that youtube codec stuff.

I did have trouble with the last few seconds of video being clipped with -shortest. I ended up specifying the appropriate -to length.

Overlay trailing text on video clip

I wanted an ending text on the video, but was having trouble getting two clips to match exactly (and doing the video like the above). Then I realized I could just use a slightly different filter:

ffmpeg -i 3-Bob.trim.MOV -loop 1 -i 3-Bob.png -loop 1 -i 4-Oh.png -filter_complex "[1:v] fade=out:st=5:d=1:alpha=1 [ov]; [2:v] fade=in:st=16:d=1:alpha=1 [oe]; [0:v][ov] overlay=0:0 [v]; [v][oe] overlay=0:0 [vf]" -map "[vf]" -map 0:a -codec:v libx264 -crf 21 -bf 2 -flags +cgop -pix_fmt yuv420p -codec:a aac -strict -2 -b:a 384k -r:a 48000 -movflags faststart -to 21 4-Bob.title.mp4

This creates a third input stream, which is also a looped image. We create stream [oe] (overlay end), which does a fade-in at 16 seconds. We then take [v] and overlay [oe] on it, creating [vf]. We then map [vf] to our output.


The video is what I set out to make. It might have been nicer to do some audio fades, but I'll live.

Next time

Now that I'm done, I think it would have been a better idea to have the overlays fade in and out, as well as figure out better scene transitions. Next time...