[SOLVED]problem saving record in UTF-8

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
Jaxolotl
Forum Contributor
Posts: 137
Joined: Mon Nov 13, 2006 4:19 am
Location: Argentina and Italy

[SOLVED]problem saving record in UTF-8

Post by Jaxolotl »

Hi everyone, I'm not shure if this is a PHP or mySQL problem.

I'm trying to make a small CMS experiment that supports non lating languages.
the front end uses UTF-8 as charset correctly
the database collation is utf8_general_ci
the table collation is utf8_general_ci
the field collation is utf8_general_ci
should I use utf8_unicode_ci instead of utf8_general_ci?

platform is
phpMyAdmin 2.6.1
MySQL 4.1.9
apache 1.3.33
php 4.3.10
Firefox 2 / IE 7.0.5730.11



but when I post the contents from my CMS it convert the characters with errors, of course when I do it from the phpMyAdmin it save the characters correctly.

a resume of the HTML front-end

Code: Select all

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
	<head>
		<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	</head>
	<body>
		<form name="THENAME" action="" method="POST">
			<input type="hidden" name="rowid" value="MY VALUE">
			<input type="hidden" name="parent" value="MY VALUE">
			<textarea name="my_content" rows="20" style="width:98%;" title="USE HTML TAGS HERE"></textarea>
		<input name="edit_confirm" value="Confirm edition" type="submit">
		</form>
	</body>
</html>
the resume of the PHP back-end

Code: Select all

$edit_query = "UPDATE `my_table` SET

parent = '".$_POST['parent']."',

my_content = ".process_html_text($_POST['my_content'])."

WHERE rowid='".$_POST['rowid']."'";


if(sql_select($edit_query,$edit_results)){ // my abstraction sql_select() use mysql_query() to store data (it also check connection and handle errors)
	echo "SUCCESS MESSAGE";
}
else{
	echo "MY ERROR MESSAGE";
	// AND MY LOG ERROR SCRIPT
}



###############################################  PROCESS HTML text for db query
function process_html_text($string){
//process_textarea_text
// textarea tag was previously transformed when rendered on the textarea to 
//prevent nested textareas and now is stored correctly

	$patterns = array (
	"#'#",
	'#\[\s{0,}textarea#is', //OPEN TEXTAREA
	'#\[\s{0,}/\s{0,}textarea\s{0,}\]#is'//CLOSE
	);
	$substitutions = array(
	"\'",
	'<textarea',
	'</textarea>'
	);
	$output_string = preg_replace($patterns,$substitutions,trim(stripslashes($string)));
	return "$output_string";
}

of course, if I use latin charset on the frontend and the DB collation the characters are automatically transformed into /&#(\d+);/ decimal format but by doing this, the stored information will need to be rendered in HTML to make it undersundable

for example
* 网页
* 资讯
* 知识
* 音乐
* 图片
* 影视
* 酷帖
* 更多

is the rendered code of

Code: Select all

* 网页
* 资讯
* 知识
* 音乐
* 图片
* 影视
* 酷帖
* 更多
and no comments about the increasing amount of data needed to store large sequences of content in decimal format.

anyone can help me about it?
is there a query error? or what?
Last edited by Jaxolotl on Wed Dec 06, 2006 8:53 am, edited 4 times in total.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Do your pages send a header character encoding of UTF8 as well? header()

The following may be of interest/help: viewtopic.php?t=36549
User avatar
Jaxolotl
Forum Contributor
Posts: 137
Joined: Mon Nov 13, 2006 4:19 am
Location: Argentina and Italy

headers

Post by Jaxolotl »

I'll check bout headers but if I remember right it's ok
the problem is that when I stor the data, some characters are stores correctly but some others NOT
and I don't know the reason why.

ex.
text stored correctly in decimal notation as shown in the front-end
------------------------------------------
* 网页
* 资讯
* 知识
* 音乐
* 图片
* 影视
* 酷帖
* 更多

text stored wrong in UTF-8 notation as shown in the front-end
----------------------------------------------

* 网页
* 资讯
* 知识
* 音�? //THIS IS THE MISSING CHARACTER
* 图片
* 影视
* 酷帖
* 更多


text stored correctly in decimal notation as shown in the back-end
------------------------------------------

Code: Select all

* 网页
* 资讯
* 知识
* 音乐
* 图片
* 影视
* 酷帖
* 更多
text stored wrong in UTF-8 notation as shown in the back-end
----------------------------------------------

Code: Select all

* 网页
* 资讯
* 知识
* 音�
* 图片
* 影视
* 酷帖
* 更多
what I'm doing wrong?
I try also with utf8_unicode_ci but with the same result

the funny thing :( :( :( is that when I use phpMyAdmin interface everything is ok. uffffffff how many time will it takes me to learn uffffffffff uffffffff I'm years long from being at that level
User avatar
Jaxolotl
Forum Contributor
Posts: 137
Joined: Mon Nov 13, 2006 4:19 am
Location: Argentina and Italy

file encoding

Post by Jaxolotl »

I was reading the online manual and finde some different oppinion about encoding the file.
Some use utf-8 with no byte order marks and some other keep the source files in ASCII to avoid the Byte Order Mark and then send the header

Code: Select all

header('Content-Type: text/html; charset=utf-8');
the second one is the way I work, anyway I tried the UTF-8 not bomed file encoding but still have the same rendering/storing problem, some characters are stored wright some other doesn't.

I'm going crazy!!!!
User avatar
dibyendrah
Forum Contributor
Posts: 491
Joined: Wed Oct 19, 2005 5:14 am
Location: Nepal
Contact:

Post by dibyendrah »

In your mysql connection :
Do this first after connecting to MySQL server.

Code: Select all

sql_query("SET NAMES 'utf8'", $conn);
mysql_select_db($DB);
Othe thing you have to do is create a database or table with character-set : utf8_general_ci or any other unicode in your language.
User avatar
Jaxolotl
Forum Contributor
Posts: 137
Joined: Mon Nov 13, 2006 4:19 am
Location: Argentina and Italy

Post by Jaxolotl »

Hi brother dibyendrah, nice to hear from you again

This is the detailed steps I done, can you tell me where I'm on error?

Step 1 Create the database and tables

Code: Select all

-- FIRST QUERY
CREATE DATABASE `my_db` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;


-- SECOND QUERY
CREATE TABLE `my_table` (
`rowid` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`my_text` TEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci,
PRIMARY KEY ( `rowid` )
);


-- OR USE A COMBINED QUERY
CREATE TABLE IF NOT EXISTS `my_table` (
  `rowid` int(10) unsigned NOT NULL auto_increment,
  `my_text` text collate utf8_unicode_ci,
  PRIMARY KEY  (`rowid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Then my PHP scripting

1 . Library file unde "libraries" dir (my_lib.php)

Code: Select all

<?php
//error_reporting(0);
session_start();

define('MY_IDENTIFIER',"JAXOLOTL");
define('DEFAULT_LANGUAGE',"eng");
define('DEFAULT_CHARSET',"utf-8");

session_register(MY_IDENTIFIER.'_LANGUAGE');

/* **************************************
*              DB SETTINGS              *
*************************************** */
// LOCAL TEST
$dbhost="localhost";
$dbdb="my_db";
$dbuser="root";
$dbpwd="";

/* **************************************
*          DATABASE CONNECTION          *
*************************************** */

function retrive_data($query, &$rv) {
	global $dbhost;
	global $dbdb;
	global $dbuser;
	global $dbpwd;

	if (!($db_id = mysql_pconnect($dbhost, $dbuser, $dbpwd, MYSQL_CLIENT_INTERACTIVE))) {
		error_log("Could not connect to $dbdb: " . mysql_error());
		return false;
	}
	if (!(mysql_select_db($dbdb, $db_id))) {
		error_log("Could not select database $dbdb: " . mysql_error());
		mysql_close($db_id);
		return false;
	}
	if (!($rv = mysql_query($query, $db_id))) {
		error_log("sql_select: Invalid query: $query: " . mysql_error());
		mysql_close($db_id);
		return false;
	}
	mysql_close($db_id);
	return true;
}


############################################### RETURNS AN HTML STRING FORMAT IF PLAIN ON INPUT
function translate_breaks($string){
	if(preg_match('#\<.*?\>#s',$string)){
		return $string;
	}
	else{
		return nl2br(trim($string));
	}
}


/* **************************************
*                 LANGUAGE              *
*************************************** */

$lang = $_REQUEST['language'];  // if register_globals is off

if (isset($lang)) {
	setcookie(MY_IDENTIFIER.'_LANGUAGE',$lang,time()+(60*60*24*7),"/");
}
elseif (isset($_COOKIE[MY_IDENTIFIER.'_LANGUAGE']))
{
	$lang = $_COOKIE[MY_IDENTIFIER.'_LANGUAGE'];
}
else {
	$lang = DEFAULT_LANGUAGE;	//default language
}

$_SESSION[MY_IDENTIFIER.'_LANGUAGE']=$lang;

header('Content-Type: text/html; charset='.DEFAULT_CHARSET);
?>

2 . my frontend file under the root (index.php)

Code: Select all

<?php include_once("libraries/my_lib.php");?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
	<head>
		<meta http-equiv="Content-Type" content="text/html; charset=<?php echo DEFAULT_CHARSET; ?>">
		<title>UTF-8 TEST</title>
	</head>
	<body>
		<?php
		if(strlen($_POST['my_text'])>=1){
			$create_query = "INSERT INTO `my_table` (my_text) VALUES ('".$_POST['my_text']."')";
			// on the complete version I check for backslashed and null vales ecc, this is only a fast resume.
			if(retrive_data($create_query,$create_results)){
				echo "<strong>Data stored OK</strong><br><br>";
			}
			else{
				echo "<strong>ERROR-Database error</strong><br> Check error log for more information<br><br>";
				// on the complete version I print the error message, or use the tail cmd.
			}
		}
		?>
		<form name="my_form" action="" method="POST">
			<textarea name="my_text" rows="10" style="width:200px;" ></textarea>
			<br />
			<input type="submit">
		</form>
		<h2> Show me the last record </h2>
		<?php
		$query = "SELECT `rowid`,`my_text` FROM `my_table` ORDER BY 'rowid' DESC limit 1";
		retrive_data($query,$results);
		while($my_row = mysql_fetch_array($results)){
			echo translate_breaks($my_row['my_text']);
		}
		?>
	</body>
</html>
the data stored on the db is always strange coded

use this to test

SAMPLE TEXT just copy and paste

雅虎搜索

* 网页
* 资讯
* 知识
* 音乐
* 图片
* 影视
* 酷帖
jmut
Forum Regular
Posts: 945
Joined: Tue Jul 05, 2005 3:54 am
Location: Sofia, Bulgaria
Contact:

Post by jmut »

Code: Select all

SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
can you give us output of those
User avatar
Jaxolotl
Forum Contributor
Posts: 137
Joined: Mon Nov 13, 2006 4:19 am
Location: Argentina and Italy

Post by Jaxolotl »

ARGGGGGGG
this is the output uf the two queries you ask me


Code: Select all


SHOW VARIABLES LIKE 'character_set%'

-- OUTPUT
+-------------------------+------------------------------------------+
+     Variable_name       +                    Value                 +
+-------------------------+------------------------------------------+
+ character_set_client    +                     utf8                 +
+-------------------------+------------------------------------------+
+ character_set_connection+                     utf8                 +
+-------------------------+------------------------------------------+
+ character_set_database  +                     utf8                 +
+-------------------------+------------------------------------------+
+ character_set_results   +                     utf8                 +
+-------------------------+------------------------------------------+
+ character_set_server    +                 latin1                   +
+-------------------------+------------------------------------------+
+ character_set_system    +                     utf8                 +
+-------------------------+------------------------------------------+
+ character_sets_dir      +C:\PROGRA~1\EASYPH~1\mysql\share\charsets/+
+-------------------------+------------------------------------------+




SHOW VARIABLES LIKE 'collation%'; 

+-------------------------+------------------------------------------+
+     Variable_name       +                    Value                 +
+-------------------------+------------------------------------------+
+ collation_connection    +          utf8_unicode_ci                 +
+-------------------------+------------------------------------------+
+ collation_database      +          utf8_unicode_ci                 +
+-------------------------+------------------------------------------+
+ collation_server        +        latin1_swedish_ci                 +
+-------------------------+------------------------------------------+


both times the server is setted on latin!!!!!! ouch
is it this?
don't remember how to change it
if I change it into utf-8 ...does it comprimes the latin tables?
jmut
Forum Regular
Posts: 945
Joined: Tue Jul 05, 2005 3:54 am
Location: Sofia, Bulgaria
Contact:

Post by jmut »

this doc page will be of interest

http://dev.mysql.com/doc/refman/4.1/en/ ... ction.html

Basically these settings are by default...So if you don't set it explicitly after creating the connection...the server will use config variables, which
in your case are different for client/server part....check on the link...and this is causing you problems.

As dibyendrah suggested executing this query after making the connection should fix this.

Code: Select all

SET NAMES 'utf8';
As for:
If I change it into utf-8 ...does it comprimes the latin tables?
Not sure...I believe not but not sure at all.
User avatar
Jaxolotl
Forum Contributor
Posts: 137
Joined: Mon Nov 13, 2006 4:19 am
Location: Argentina and Italy

SET NAMES

Post by Jaxolotl »

Scuse my ignorance, but

Code: Select all

SET NAMES 'utf8';
should I use it on my connection abstraction or where?
I show the two files sources on this post, where and how do I use the SET NAMES command?
I'll apreciate really much your help, this "all about" is new for me and feel a little lost
jmut
Forum Regular
Posts: 945
Joined: Tue Jul 05, 2005 3:54 am
Location: Sofia, Bulgaria
Contact:

Re: SET NAMES

Post by jmut »

Jaxolotl wrote:Scuse my ignorance, but

Code: Select all

SET NAMES 'utf8';
should I use it on my connection abstraction or where?
I show the two files sources on this post, where and how do I use the SET NAMES command?
I'll apreciate really much your help, this "all about" is new for me and feel a little lost
As soon as you create a connection

Code: Select all

if (!($db_id = mysql_pconnect($dbhost, $dbuser, $dbpwd, MYSQL_CLIENT_INTERACTIVE))) {
                error_log("Could not connect to $dbdb: " . mysql_error());
                return false;
        }
        mysql_query("SET NAMES 'utf8'", $db_id);
If you observe the mysql log during script execution...you should notice this query first ...after making connection.
User avatar
Jaxolotl
Forum Contributor
Posts: 137
Joined: Mon Nov 13, 2006 4:19 am
Location: Argentina and Italy

SET NAMES

Post by Jaxolotl »

YESSSSSSSSSSSSS!!!!!!!!!!!!!

I really TNX YOU SO MUCH GUYS!!!!!

now everything goes right!!!!
User avatar
dibyendrah
Forum Contributor
Posts: 491
Joined: Wed Oct 19, 2005 5:14 am
Location: Nepal
Contact:

Post by dibyendrah »

Wow Jaxolotl! Good to hear that

Code: Select all

SET NAMES 'utf8';
fixed your problem.

Actually, the above query does the three following things which I forgot to mention.

Code: Select all

SET character_set_client = 'utf8';
SET character_set_results = 'utf8';
SET character_set_connection = 'utf8';
OR

To make the character-set utf8 as default in your system, open the my.cnf or my.ini and edit.

Code: Select all

[mysql]
default-character-set=utf8
To test what's going on with character set, just do the following as jmut suggested :

Code: Select all

SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
Cheers,
Dibyendra
User avatar
Jaxolotl
Forum Contributor
Posts: 137
Joined: Mon Nov 13, 2006 4:19 am
Location: Argentina and Italy

Post by Jaxolotl »

Hi dibyendrah,
about ini change

Code: Select all

[mysql]
default-character-set=utf8
I prefer

Code: Select all

SET NAMES 'utf8';
because some times you're not the host owner or can't access the ini files, or maybe running different kind of db tables for different programmers so the SQL way let's you control just your own script set.

by the way I read on the mySQL manual that utf8_unicode_ci is more accurated than utf8_general_ci but is still being developed son some characters aren't contanined on it; so I decide to use the general set instead of the unicode one.
http://dev.mysql.com/doc/refman/5.0/en/ ... -sets.html
Any word on favor or against that desition?
User avatar
dibyendrah
Forum Contributor
Posts: 491
Joined: Wed Oct 19, 2005 5:14 am
Location: Nepal
Contact:

Post by dibyendrah »

Yes, you are right Jaxolotl. In the case of shared host, we are able to so only the sql query and can't change the ini files. Anyway, now your problem is now solved. So, enjoy! :wink:
Post Reply