Page 1 of 1

Parsing html page with russian content using regex

Posted: Tue Feb 07, 2006 7:37 am
by littlebiker
Hey guys,

I am trying to parse a russian html file from a russian webpage. I am using curl. I am supposed to get some values from some set fields:

Like

Product Id: 400212
Product Type: engine

Here both the labels product Id and product price are in russian. I need to extract their values.

If the content was in english I could have done it without a problem but I am just not sure how to handle foreign languages? Any one has done this before?

Thanks!

Posted: Tue Feb 07, 2006 8:50 am
by feyd
Tried running it through Google translation, or Babelfish? It should be possible to script through them, or maybe to just get your barings as to where in the text the information is actually stored.

If we could see several examples of text (not just the specific text, but many lines around it), we may be able to write one, or give you more direction.

Posted: Tue Feb 07, 2006 10:39 am
by Weirdan
yeah, post the html source. It would be even better if you had posted the url (there could be issues with charsets, etc.)

Короче, код в студию ;)