[ale] Voxforge: An open source project that needs *your* help

Jesse Guardiani jesse at guardiani.us
Tue Apr 3 11:26:43 EDT 2007


Howdy folks,

I've recently started doing some research into speech recognition. As a
programmer the technology has always fascinated me. It's a very
complicated and little understood field with broad applications from
cell phones (you probably already have speech recognition built into
your cell phone now in the form of a voice dialer) to IVRs, dictation
devices, and language translators.

I'm sure most of you have heard of CMU Sphinx. Perhaps some of you have
wondered why there isn't an asterisk plugin for Sphinx? Or an open
source desktop dictation system? (xvoice doesn't count as it relies on a
closed source engine) The explanation is rather simple. Sphinx and other
open source speech recognition systems are available and Free, but the
Speech Corpora (
http://www.voxforge.org/home/docs/faq/faq/what-is-a-speech-corpus-or-speech-corpora
) they rely on for training and accuracy are closed source and *not*
free. It takes a lot of time, effort, and *diversity* to create an
adequate speech corpora, not to mention a certain amount of expertise.
However, the solution to this problem is quite simple. We need some
volunteers to click this link (
http://www.voxforge.org/home/submitspeech ) and follow the instructions.
Once we have enough open source transcribed audio data, a high quality
open source speech corpora will naturally follow. I believe Voxforge
currently has 14 hours of open source transcribed speech data or 10% of
their first goal ( http://www.voxforge.org/home/downloads/metrics ).

Voxforge currently targets an open source speech recognition engine that
might be unknown to you. It's called Julius (
http://julius.sourceforge.jp/en_index.php?q=en/index.html ). Julius
ships with only a Japanese acoustic model ( an acoustic model, or AM, is
basically a compiled speech corpus ). Voxforge, however, has already
released an English acoustic model for Julius from the yet sparse audio
data that they have been able to collect from volunteers and the
community. You can play with Julius and the Voxforge AM by downloading
the bundled "Quickstart" binary packages here (no need to compile or
install Julius. A binary is included in the download):
http://www.repository.voxforge1.org/downloads/Tags/Releases/0_1_1-build726/
We do not yet have enough voice data to support continuous speech
recognition, therefore the quickstart demos use the grammar restricted
version of Julius known as Julian. Essentially, it works like a cell
phone voice dialer. You can say things like 'DIAL 1 2 3 4 5' or 'CALL
STEVE', but you can't say things like 'CALL 1 2 3 4 5' or 'DIAL STEVE'
because the grammar is restricted. This will change as we collect more
voice data for our model.

Last night I submitted my first dialect coverage prompt with my own
voice. It's a relatively painless process and takes about an hour and a
half of your time if you're overly slow and cautious like me. All you
need is Audacity, a quite room, the ability to follow instructions, and
a decent headset mic (desktop mics aren't recommended as changing
speaker proximity can introduce undesirable variation in the
recordings). I bought mine, a Logitech stereo headset, from Walmart a
few days ago for about $20. I know that's rather expensive, but it's
worth it for the increased quality. You also need a decent sound card
with a clean low static microphone input. Most computers ship with this
from the factory. You don't even have to use Linux. Audacity runs just
fine on win32 and we don't care which operating system creates the
audio. I intend to continue submitting one or two prompts each night for
the next few weeks. I also intend to ask my wife to submit a few prompts
as well. I hope you'll do the same. If we can recruit a reasonable
number of volunteers then we can achieve our goal quickly.

Thanks!


-- 
Jesse Guardiani
Programmer/Sys Admin
jesse at guardiani.us




More information about the Ale mailing list