[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Corpora-List] free language models?



Robert -

I have the sphinx tools here at present, and I have built several LMs from
text data I have accumulated over the years.  *However*, the most important
issue for an LM is the agreement with the lexicon of pronunciations.  You
also want to consider whether you need/want a domain independent LM, or
specific to a domain, and whether tokenization is important to you (do you
want to allow dictation, where "comma" and "period" (or "full stop" in UK
english) attach to the word to the left, and "period"/"full stop" has
additional capitalization rules?  Also, do you want US or UK English?  How
contemporary do you want the lexicon to be?  (Should "e-commerce" be in the
vocabulary?)

If you have answers for these questions, I may be able to build an LM for
you...  What's the project?

--- Jonathan

----- Original Message -----
From: "robert b" <http://dummy.us.eu.org/robert>
To: <http://www.HD.UIB.NO/~CORPORA>
Sent: Monday, December 16, 2002 2:20 PM
Subject: [Corpora-List] free language models?

> An open source project I'm working on is looking for a free language model
(LM).  Being speech recognition neophytes, we are loathe to build our own
from an existing corpus.
>
> We'll be using the LM with Sphinx-II (Sphinx 2) and therefore need a
trigram-based LM.
>
> Does anyone know if there are any free LMs available anywhere?  An LM
based upon the OpenContent license would be especially welcome.
>
> Thanks!
>




Why do you want this page removed?