I tried to solve this out for hours, without success
Encoding of my scripts and all is UTF-8. I use this code to extract the content of the page encoded in ISO 8859-2 (the page is czech language with characters containing their special symbols, ...)
Code: Select all
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_COOKIEFILE, "cookiefile");
curl_setopt($curl, CURLOPT_COOKIEJAR, "cookiefile"); # SAME cookiefile
curl_setopt($curl, CURLOPT_URL, $search_url); # this is where you first time connect - GET method authorization in my case, if you have POST - need to edit code a bit
$content = curl_exec($curl);
So I try to change the coding from original page encoding (ISO 8859-2) to mine (Utf-8). I used different methods: iconv, libiconv, differnet user functions from internet (iso88592_2utf8(), convert_charset, ...) but nothing helps. The result is even worse.
I don't know what to do to solve it.
There is one strange thing that confuses me: If I use right after
Code: Select all
$content = curl_exec($curl);
$enc = mb_detect_encoding(content );
Any ideas how to solve it ?
Thanx
Maros