This is a re-packaging and a light re-edit of an updated talk I produced for MLUG back in 2013 to try and extract the more interesting non-racing portions of the broadcast.
This is presented as a historical look at what I was doing at the time but not particularly relevant today as much of the technique described here is superceded.
Tour des Châteaux/Torres de Castellos
Automated filtering of Tour de France and Vuelta a España footage with Linux
Background
Tour de France
A multiple stage bicycle race primarily held in France, while also occasionally making passes through nearby countries. First organized in 1903 to increase paper sales for the magazine L’Auto, it has been held annually since its first edition in 1903 except for two World Wars. As the Tour gained popularity it has lengthened to 21 days and its reach has extended around the globe. Participation has expanded from a primarily French field, as riders from all over the world began to take part. (thanks wikipedia)
Vuelta a España
An annual multiple stage bicycle race primarily held in Spain, while also occasionally making passes through nearby countries. Inspired by the success of the Giro d’Italia and the Tour de France, the race was first organized in 1935. The race was prevented from being run by the Spanish Civil War and World War II in the early years of its existence; however, the race has been held annually since 1955. (thanks wikipedia)
Broadcasting
Tour de France
French broadcasters France 2 and 3 organise the Tour broadcast with equipment including:
- a staff of 300
- three helicopters
- two aircraft
- five motorcycles
- 35 other vehicles and trucks (edit suite, comms)
- 20 fixed cameras
- local Australian DVB broadcast by SBS television
Vuelta a España
Sadly, Wikipedia has next to no information on this, but we presume very similar requirements:
- two camera helicopters
- one aircraft for radio/comms
- three camera motorcycles
- local Australian DVB broadcast by SBS television
Impressions
Overall
- Each Tour de France live broadcast typically 4 hours
- I’m not amazingly interested in road cycling
- Too many shots of sweaty fit men on bicycles
- Usually good commentary, but often inane
- Whilst there are scenery shots, difficult to find
- On holiday, missed several days, hard to catch up
- 4 hour DVB SD files are 5.9GB, HD files 17GB (!)
- Vuelta is easier - limited live SBS broadcast … :(
Minor annoyances
Major annoyances
Items of interest
helicopter shots
Most interesting shots of scenery or buildings are from the helicopter-mounted cameras
- Broadcast audio tends to have helicopter sound mixed in to a certain extent
- Engine turbine clearly audible during shots
- External shots do not feature tone, from cabin only
- Audio is much, much easier to process than video
Fast Fourier Transforms 101
Digital Systems tool to convert time-domain signal into frequency-domain signal
- Used to be processor intensive, now not so much
- Many different tools available
- Chose to use sndfile-spectrogram due to simplicity
- Takes WAV input, outputs PNG
- Ultimately want to use python’s numpy to do this bit, but not gotten there yet
“Boring” shot FFT
“Interesting” shot FFT
Prototype processing method
- mplayer to play/dump audio from DVB MPEG TS
- mp3splt to chop into small chunks
- mpg123 to convert each to PCM
- sndfile-spectrogram to convert to PNG
- netpbm suite to isolate tone
- perl script glue to create edit decision list
- mencoder to convert MPEG to Xvid with EDL
Image processing
- Use image processing tools to enhance tone line
- Convert to grey scale
- Normalise to reduce noise floor
- Convolute with mask to identify shape
- Crop to isolate area of interest
- Sort histogram of pixels for simple shell script based determination
Convolution 101
- Old (but simple) image processing technique
- Define a square odd-sided greyscale/colour bitmask
- Moves mask around input image, centred on midpoint of mask
- Each output pixel values are the normalised sum of original and mask pixels multiplied together
- Mask shapes help enhance similar in original
- Can be tricky to get mask sizes and values correct
- May not work with very noisy sources
sample input
A variety of different shapes - we’re looking for circles of a particular size
sample mask
Here is the convolution mask we’re going to use
convolution output
After running the mask across the image, the bright points indicate where the process found the circles we were after
output overlaid
We see that the bright points from the output match perfectly with the circles in the input image
Workflow
boring shot
-
We get the FFT image
-
Convert to greyscale
-
Normalise the grey values
-
Apply our convolution mask
-
Crop to find the histogram of bright pixels
interesting shot
-
We get the FFT image
-
Convert to greyscale
-
Normalise the grey values
-
Apply our convolution mask
-
Crop to find the histogram of bright pixels
histogram
A check to confirm that relying on pixel brightness is a good proxy for whether there is tone or not
| brightness | tone | no_tone |
|---|---|---|
| 0 | 59 | 863 |
| 1 | 0 | 9 |
| 2 | 0 | 4 |
| 3 | 0 | 7 |
| 4 | 0 | 6 |
| 5 | 1 | 2 |
| : | : | : |
| : | : | : |
| 250 | 1 | 2 |
| 251 | 2 | 2 |
| 252 | 2 | 1 |
| 253 | 0 | 2 |
| 254 | 0 | 2 |
| 255 | 1712 | 390 |
categorisation
How long is a good period to chunk audio into?
Commands
dumping audio
mplayer quiet dumpaudio dumpfile /dev/stdout source.mpg |
mp3splt -Q -k -t 0.7.0 -o @m3@s -d spool -
mpg123 -qw chunk.wav chunk.mp3
tone scanning
sndfilespectrogram noborder chunk.wav \
640 480 /dev/stdout |
pngtopnm |
ppmtopgm |
pnmnorm bpercent 70 |
pnmpad top 31 bottom 31 left 31 right 31 |
pnmconvol mask.pgm |
pnmcut left 31 right 671 top 309 bottom 323 |
ppmhist noheader sort=rgb |
sed ne '/255.*255.*255.*255/!b' e 's/.* //' -ep
convert/extract
mencoder \
-tskeepbroken \
-edl extract.edl \
-of avi \
-o extract.avi \
-oac mp3lame \
-lameopts abr=128 \
-audiodelay -0.100 \
-ovc xvid \
-xvidencopts bitrate=1200:max_bframes=0:vhq=4 \
-vf scale= 640:352 \
-aspect 16:9 \
source.mpg
Final thoughts
Tour de France
- Still too many shots of cyclists, but not up-close
- Depends heavily on edit technique and audio mix
- mplayer/mencoder are really finicky
- AVI format is really finicky for seeking
- Using image-processing for audio-processing, wtf
Vuelta a España
- Different country, naturally the helicopters have a different accent – needed to redo convolution mask
- More smear in audio signature results in less accurate detection – more false positives due to helicopter flying too near camera motorcycles
- Less of race broadcast by SBS – only 8 stages in full
- Highlights package tends to focus on actual race, less on scenery
Accents
-
a French helicopter:
-
a Spanish helicopter: