Page 1 of 1

Google Search API for cyrillic

Posted: Fri Sep 08, 2006 1:31 am
by jmut
Hi all,
I am trying to run search in cyrillic using the google API.
In the documentation http://www.google.com/apis/reference.html
they say I should use utf-8 to send data and should expect utf-8 in return.


I use UTF-8 in my IDE, and I hardoced the search string in the source..for try.
When I get the results I use

Code: Select all

header('Content-type: text/html; charset=utf-8');
var_export($results);
It just show ???? and such kind of stuff. Anyone had this problem or any idea what might be wrong.

Edit: I use php5 btw. Also tried setting different encodings in the browser..but no luck

Posted: Fri Sep 08, 2006 2:32 am
by Mordred
Couple of things come to mind:

1. Are you sure your text document (source) is saved as utf8? Open it with a hex editor and see.
2. Maybe your local server has a default encoding set in php.ini (deafult_charset or something like that) that takes precedence over your header (or maybe I should say "postcedence", IIRC if the same non-multivalue header is sent several times the last one SHOULD be used).
Check this with a local proxy (proxomitron) or telnet ;) Zend's IDE can also show the output headers.

Здрасти, между другото ;)

Posted: Fri Sep 08, 2006 8:43 am
by jmut
Mordred wrote:Couple of things come to mind:

1. Are you sure your text document (source) is saved as utf8? Open it with a hex editor and see.
2. Maybe your local server has a default encoding set in php.ini (deafult_charset or something like that) that takes precedence over your header (or maybe I should say "postcedence", IIRC if the same non-multivalue header is sent several times the last one SHOULD be used).
Check this with a local proxy (proxomitron) or telnet ;) Zend's IDE can also show the output headers.

Здрасти, между другото ;)

1.

Code: Select all

//I would believe this is utf8 indeed...and it is not the first time I use utf8
jmut@dexter:$ file test.php
test.php: UTF-8 Unicode C++ program text
2.

Code: Select all

These are the response headers I get. Used Firefox->Web Developer Extension 1.0.2. -> Information -> View Response Headers


Date: Fri, 08 Sep 2006 13:41:20 GMT
Server: Apache/1.3.33 (Unix) mod_ssl/2.8.25 OpenSSL/0.9.8a PHP/5.1.2
X-Powered-By: PHP/5.1.2
Keep-Alive: timeout=15, max=99
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

200 OK
Stupid API...I see it is beta but still :(

Здравей :) ...звучи ми много познат ника но не мога да сгрея.

Posted: Fri Sep 08, 2006 10:09 am
by Weirdan
well, are you sure you're indeed getting UTF from google server?

write the results to text file and then examine it, as it was suggested, in hex editor.

Болгары, что ли?

Posted: Fri Sep 08, 2006 12:20 pm
by jmut
Well, not really sure what I should look for in the hex view.

I save the content with file_put_contents() of var_export($result,1);

using mcview to see hex content.

http://up.drun.net/files/snapshot1%5B1%5D.png

Posted: Fri Sep 08, 2006 1:58 pm
by Mordred
Looks like you didn't get utf8. Do the hex magic with the source file as well. All cyrillic characters should be two bytes.

It very much looks like you've told google that you send utf-8, while in fact you were sending win1251. The web interface does the same:
http://www.google.com/search?rls=en&q=% ... 8&oe=utf-8
(this is supposed to be "проба")