Page 1 of 1
simpletest. unable to assertWantedText() umlauts
Posted: Thu Aug 24, 2006 10:54 am
by jmut
Hi,
I have problem matching umlauts with simple test.
The original page outputs encoding as
Code: Select all
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
Do I have to make any settings in simpletest about encoding?
I tried having umlauts as strings, and umlauts as html entities...in both cases simpletest does not understand the text?
Can someone confirm this...or I am doing something wrong

Posted: Thu Aug 24, 2006 1:27 pm
by sweatje
How do regular pcre_match() expressions work with your UTF8 characters? IIRC, isn't this supposed to be a major focus of the PHP6 effort?
Posted: Thu Aug 24, 2006 2:28 pm
by Ambush Commander
Pass 'UTF-8' to HTMLReporter and SimpleTest will output UTF-8.
Posted: Fri Aug 25, 2006 5:10 am
by jmut
sweatje wrote:How do regular pcre_match() expressions work with your UTF8 characters? IIRC, isn't this supposed to be a major focus of the PHP6 effort?
Regular preg_match() works...
Code: Select all
HTML code is
Ihre persönlichen Daten
Code: Select all
This catches the umlaut
$txt = file_get_contents('http://localhost/umlautTest.html');
$pr = preg_quote('persönlichen');
preg_match("#$pr#",$txt,$matches);
var_export($matches);
//outputs:
array (
0 => 'persönlichen',
)
The thing is I cannot do any of that in simpletest using assertText() or something.
So I am going to find out how to fetch source from the browser object. and do regular preg_match.
The key code in simpletest is this
Code: Select all
//parser.php ~line 699
function decodeHtml($html) {
static $translations;
if (! isset($translations)) {
$translations = array_flip(get_html_translation_table(HTML_ENTITIES));
}
return strtr($html, $translations);
}
//I guess if something here is changed a valid compare using assertText() will be possible.
Posted: Fri Aug 25, 2006 5:10 am
by jmut
Ambush Commander wrote:Pass 'UTF-8' to HTMLReporter and SimpleTest will output UTF-8.
This is only used for output..have no reference when it comes to compare and stuff...I guess.
Posted: Fri Aug 25, 2006 6:27 am
by Ambush Commander
Encode your files in UTF-8 and put the actual umlauts in source.
Posted: Fri Aug 25, 2006 7:06 am
by jmut
Ambush Commander wrote:Encode your files in UTF-8 and put the actual umlauts in source.
Yes, that works...it is just that I don't have control over the html

Thanks anyway....using preg_match seems reasonable enough. So this problem I consider closed.