[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xvoice-sphinx] Usenet corpus status

I'ill be interested in seeing the code even if you abandon it.  I don't know how I'd do this
in Java and would be curious.

I'm still catching up with my mail.  I've been busy this week (actually, dealing with my job
search (interviews, resume improvements, networking, etc.))

--- "Jessica P. Hekman" <http://www.arborius.net/~jphekman> wrote:
> Google was too hard to scrape. YahooGroups was too hard to scrape. 
> Currently scraping LiveJournal -- and actually feeling pretty good about 
> it.
> I need to
>  * clean up my code a little
>  * try to get the stuff we've scraped into decent chunks which can be 
> processed by an LM tool (right now it's hard to tell where one sentence 
> stops and another starts)
>  * get a long list of LJ users to hand to the tool
>  * distribute to y'all for testing
> and then maybe point that LM-generating tool at it. Hopefully by then 
> Robert will have it working and we will have successfully come at the 
> problem from both ends :)
> j
> -------------------------------------------------------
> This SF.net email is sponsored by: ValueWeb: 
> Dedicated Hosting for just $79/mo with 500 GB of bandwidth! 
> No other company gives more support or power for your dedicated server
> http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/

Why do you want this page removed?