Google Search API for cyrillic

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
jmut
Forum Regular
Posts: 945
Joined: Tue Jul 05, 2005 3:54 am
Location: Sofia, Bulgaria
Contact:

Google Search API for cyrillic

Post by jmut »

Hi all,
I am trying to run search in cyrillic using the google API.
In the documentation http://www.google.com/apis/reference.html
they say I should use utf-8 to send data and should expect utf-8 in return.


I use UTF-8 in my IDE, and I hardoced the search string in the source..for try.
When I get the results I use

Code: Select all

header('Content-type: text/html; charset=utf-8');
var_export($results);
It just show ???? and such kind of stuff. Anyone had this problem or any idea what might be wrong.

Edit: I use php5 btw. Also tried setting different encodings in the browser..but no luck
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

Couple of things come to mind:

1. Are you sure your text document (source) is saved as utf8? Open it with a hex editor and see.
2. Maybe your local server has a default encoding set in php.ini (deafult_charset or something like that) that takes precedence over your header (or maybe I should say "postcedence", IIRC if the same non-multivalue header is sent several times the last one SHOULD be used).
Check this with a local proxy (proxomitron) or telnet ;) Zend's IDE can also show the output headers.

Здрасти, между другото ;)
jmut
Forum Regular
Posts: 945
Joined: Tue Jul 05, 2005 3:54 am
Location: Sofia, Bulgaria
Contact:

Post by jmut »

Mordred wrote:Couple of things come to mind:

1. Are you sure your text document (source) is saved as utf8? Open it with a hex editor and see.
2. Maybe your local server has a default encoding set in php.ini (deafult_charset or something like that) that takes precedence over your header (or maybe I should say "postcedence", IIRC if the same non-multivalue header is sent several times the last one SHOULD be used).
Check this with a local proxy (proxomitron) or telnet ;) Zend's IDE can also show the output headers.

Здрасти, между другото ;)

1.

Code: Select all

//I would believe this is utf8 indeed...and it is not the first time I use utf8
jmut@dexter:$ file test.php
test.php: UTF-8 Unicode C++ program text
2.

Code: Select all

These are the response headers I get. Used Firefox->Web Developer Extension 1.0.2. -> Information -> View Response Headers


Date: Fri, 08 Sep 2006 13:41:20 GMT
Server: Apache/1.3.33 (Unix) mod_ssl/2.8.25 OpenSSL/0.9.8a PHP/5.1.2
X-Powered-By: PHP/5.1.2
Keep-Alive: timeout=15, max=99
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

200 OK
Stupid API...I see it is beta but still :(

Здравей :) ...звучи ми много познат ника но не мога да сгрея.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

well, are you sure you're indeed getting UTF from google server?

write the results to text file and then examine it, as it was suggested, in hex editor.

Болгары, что ли?
jmut
Forum Regular
Posts: 945
Joined: Tue Jul 05, 2005 3:54 am
Location: Sofia, Bulgaria
Contact:

Post by jmut »

Well, not really sure what I should look for in the hex view.

I save the content with file_put_contents() of var_export($result,1);

using mcview to see hex content.

http://up.drun.net/files/snapshot1%5B1%5D.png
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

Looks like you didn't get utf8. Do the hex magic with the source file as well. All cyrillic characters should be two bytes.

It very much looks like you've told google that you send utf-8, while in fact you were sending win1251. The web interface does the same:
http://www.google.com/search?rls=en&q=% ... 8&oe=utf-8
(this is supposed to be "проба")
Post Reply