Robert - I have the sphinx tools here at present, and I have built several LMs from text data I have accumulated over the years. *However*, the most important issue for an LM is the agreement with the lexicon of pronunciations. You also want to consider whether you need/want a domain independent LM, or specific to a domain, and whether tokenization is important to you (do you want to allow dictation, where "comma" and "period" (or "full stop" in UK english) attach to the word to the left, and "period"/"full stop" has additional capitalization rules? Also, do you want US or UK English? How contemporary do you want the lexicon to be? (Should "e-commerce" be in the vocabulary?) If you have answers for these questions, I may be able to build an LM for you... What's the project? --- Jonathan ----- Original Message ----- From: "robert b" <http://dummy.us.eu.org/robert> To: <http://www.HD.UIB.NO/~CORPORA> Sent: Monday, December 16, 2002 2:20 PM Subject: [Corpora-List] free language models? > An open source project I'm working on is looking for a free language model (LM). Being speech recognition neophytes, we are loathe to build our own from an existing corpus. > > We'll be using the LM with Sphinx-II (Sphinx 2) and therefore need a trigram-based LM. > > Does anyone know if there are any free LMs available anywhere? An LM based upon the OpenContent license would be especially welcome. > > Thanks! >