SpeechSegmenter

For the most part LIUM_spkDiarization does all the hard work for us, much better than the old LIUM tools (don't even think about CMUseg or cont_fileseg!). So for example to segment an audio file in FLAC format:

flac -d $show.flac
sphinx_fe -argfile $hmmdir/feat.params -i $show.wav -o $show.mfc -mswav yes
nfr = `sphinx_cepview -f $show.mfc 2>&1 | perl -ne '/Total (\d+) frames/ && print $1-1'`
echo "$show 0 $nfr $show" > $show.in.ctl
./LIUM_SpkDiarization-3.1.jar --fInputMask=%s.mfc --fInputDesc=sphinx --sInputMask=%s.in.ctl --sInputFormat=ctl --sOutputMask=%s.out.ctl --sOutputFormat=ctl $show

I'm going to add FLAC support to sphinx_fe so that we don't have these awful huge WAV intermediate files.

Here's also a little script to convert that to a label file that Audacity understands, so you can inspect the segments:

while (<>) {
    chomp;
    my ($show, $sf, $ef, $uttid) = split;
    $sf /= 100;
    $ef /= 100;
    print "$sf\t$ef\t$uttid\n";
}

But in actual fact we will be outputting XML files for Transcriber - this is so that ManualTranscription can be done for some parts of the data which will be necessary for benchmarking and tuning the system.

SpeechSegmenter (last edited 2010-02-18 16:32:12 by python-software-foundation)