Page 1 of 1

language detection?

Posted: Mon Oct 29, 2007 12:56 pm
by nathanr
Evening,

I'm here to pick the brains of all you guru's out there..

The Data: all utf-8, however covers many languages, often as is the way, the lanuages is a mix of native and english (ie russian with english nouns, german with english nouns.. and so on)

Needed: need to be able to detect which language the content is, and we have no info to play with other than the chars (which are all utf-8), it's all just string content, no html, no meta data etc etc..

Any ideas.. any solution concidered, any linux avaliable languages, or even an api..

Many Thanks in advance.

nath