[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Corpora-List] free language models?

To: http://dummy.us.eu.org/robert
Subject: Re: [Corpora-List] free language models?
From: http://www.attbi.com/~Jonathan_Young
Date: Mon, 16 Dec 2002 17:52:46 -0500
Date: Mon, 16 Dec 2002 15:14:31 -0500

Robert -

I have the sphinx tools here at present, and I have built several LMs from
text data I have accumulated over the years.  *However*, the most important
issue for an LM is the agreement with the lexicon of pronunciations.  You
also want to consider whether you need/want a domain independent LM, or
specific to a domain, and whether tokenization is important to you (do you
want to allow dictation, where "comma" and "period" (or "full stop" in UK
english) attach to the word to the left, and "period"/"full stop" has
additional capitalization rules?  Also, do you want US or UK English?  How
contemporary do you want the lexicon to be?  (Should "e-commerce" be in the
vocabulary?)

If you have answers for these questions, I may be able to build an LM for
you...  What's the project?

--- Jonathan

----- Original Message -----
From: "robert b" <http://dummy.us.eu.org/robert>
To: <http://www.HD.UIB.NO/~CORPORA>
Sent: Monday, December 16, 2002 2:20 PM
Subject: [Corpora-List] free language models?

> An open source project I'm working on is looking for a free language model
(LM).  Being speech recognition neophytes, we are loathe to build our own
from an existing corpus.
>
> We'll be using the LM with Sphinx-II (Sphinx 2) and therefore need a
trigram-based LM.
>
> Does anyone know if there are any free LMs available anywhere?  An LM
based upon the OpenContent license would be especially welcome.
>
> Thanks!
>

Prev by Date: Re: Spam Bouncer
Next by Date: Re: [Corpora-List] free language models?
Previous by thread: Re: Spam Bouncer
Next by thread: Re: [Corpora-List] free language models?
Index(es):
- Date
- Thread