This is something I've struggled in understanding for a while.
I'm moving data from our filemaker database, into MySQL.
When I export data from filemaker, I do so using PHP and a FileMaker XML class (FX.php) and I basically output a text page (rather than HTML) with all the SQL statements built so I just paste them into PHPMyAdmin to import them into MySQL like so...
Code: Select all
INSERT INTO notes VALUES ('',"000004","","","","Decals & Tank & Yoke Protectors","0");
INSERT INTO notes VALUES ('',"000027","","","","KTM, Malaguti, Yamaha, Malaguti & KTM; Accessories, helmets.","0");
INSERT INTO notes VALUES ('',"000028","","","","Customized scooters Lambretta & Vespa in Mod style.","0");
INSERT INTO notes VALUES ('',"000036","","","","Lubricants, oils","0");
INSERT INTO notes VALUES ('',"000044","","","","Import","0");
INSERT INTO notes VALUES ('',"000053","","","","KTM & Betamotor, Suzuki","0");
INSERT INTO notes VALUES ('',"000057","","","","Cylinders, Engine Shafts, Connecting Rods & Connecting Rods system. Kits","0");
INSERT INTO notes VALUES ('',"000061","","","","Small Bikes","0");
INSERT INTO notes VALUES ('',"000065","","","","Small Bikes","0");Is it best to store the data in MySQL with the entities encoded for HTML already?
I would have thought that's not the best policy.
Here's my PHP code which runs through the records...
Code: Select all
<?php
require ('config.php');
function sql_safe($value)
{
// Stripslashes
if (get_magic_quotes_gpc()) {
$value = stripslashes($value);
}
// Quote if not a number or a numeric string
// if (!is_numeric($value)) {
// $value = "'" . mysql_real_escape_string($value) . "'";
// }
mysql_real_escape_string($value);
return $value;
}
$skipvalue = $_GET['skipvalue'];
// DO FILEMAKER QUERY
$fmp_query = new FX($fm_ip, $fm_port, $fm_schema);
$fmp_query->SetDBData('NOTES', 'TEST', 10000);
$fmp_query->SetDBPassword($fm_pass, $fm_user);
$fmp_query->AddDBParam('COMPANY ID', $company_id);
$fmp_query->AddSortParam('CREATE STAMP', 'ascend');
$fmp_query->FMSkipRecords($skipvalue);
$fmp_result = $fmp_query->FMFindAll(true, full, false);
$found_count = $fmp_result['foundCount'];
$fmp_error_code = $fmp_result['errorCode'];
//echo $fmp_error_code;
// LOOP THROUGH FILEMAKER RECORDS
foreach ( $fmp_result['data'] as $fmp_row) {
$company_id = trim($fmp_row['COMPANY ID']);
$contact_id = trim($fmp_row['CONTACT ID']);
$create_stamp = trim($fmp_row['CREATE STAMP']);
$create_name = trim($fmp_row['CREATE NAME']);
$note = trim($fmp_row['NOTE']);
$flag = trim($fmp_row['FLAG']);
if ($flag !== 1) {
$flag = 0;
}
$company_id = str_replace('II#', '', $company_id);
// SEE SQL SAFE FUNCTION ABOVE - THE QUOTE BIT IS COMMENTED OUT DELIBERATELY
$note = sql_safe($note);
// REMOVE NEWLINES FROM OUR FILEMAKER note VARIABLE, REPLACE WITH A SPACE THEN TRIM
$note = trim(str_replace("\n", ' ', $note));
$note = addslashes($note);
// ADD SQL INSTRUCTION TO OUR MEGA VARIABLE
$sql_text .= <<<HTML
INSERT INTO notes VALUES ('',"$company_id","$contact_id","$create_stamp","$create_name","$note","$flag");\r\n
HTML;
}
// GET RID OF ANY HTML TAGS FROM OUR MEGA VARIABLE
$sql_text = strip_tags($sql_text);
// DEAL ENTITIES - CONFUSION IS ALL HERE!
$sql_text = html_entity_decode($sql_text, ENT_QUOTES);
$sql_text = preg_replace('/&#(\d+);/me', "chr(\\1)", $sql_text);
$sql_text = preg_replace('/&#x([a-f0-9]+);/mei', "chr(0x\\1)", $sql_text);
// REMOVE DOUBLE SPACES
while ( substr_count($sql_text, ' ') > 0) {
$sql_text = str_replace(' ', ' ', $sql_text);
}
header('Content-type: text/plain; Content-encoding: utf-8');
echo $sql_text;
?>Basically I'm confused!
I thought this was all working ok until I came across the trademark TM in my filemaker database. Numeric entity &.#8482; (without the period!)
Once all the above is run and I have my output, SQL chokes on the import - because where the &.#8482; was happening becomes a double quote (speech mark) character "
I think this trademark character problem is the character encoding I'm outputting. I'm not sure if I need the utf-8 on the penultimate line.
The collations in MySQL seem to all be chosen automatically as latin1_swedish_ci (not sure what I need it set to).
So first up, what am I doing with my character encoding here that's causing my TM character to become a " character.
Secondly, what's best practice in terms of data storage?
Store the raw characters or the HTML entity versions?
Thanks guys
Ben