I have to screen-scrape a site which is in chinese. Need to store the required content in the database and then retrieve those content to dispaly on the webpage. I fetched the content of the site using file_get_contents method. Then converted the content in the DOM object using DOMDocument. After that I ran the xpath query on DOM object to get the desired content. The numeric field are displayed well but I am facing the issue with chinese character. I made the column collation type to utf8_general_ci in the database table and it is of varchar type. But when I displaying the chinese character on the screen it is not showing me in the right format. The character encoding used on the site from which I am fetching the data is GBK. I tried to convert the character encoding to UTF-8 but still it is not showing in correct format. Also specified the character encoding to GBK on the page heading but no result. I am just stuck on this point.
Thanks in advance.
Anil Kumar
Character encoding issue from GBK to UTF
Moderator: General Moderators
-
anilchahar
- Forum Newbie
- Posts: 1
- Joined: Sun Apr 24, 2011 7:09 am