Page 1 of 1

Space Characters Getting Converted in Drupal

Posted: Fri Jun 24, 2011 8:37 am
by Supershaba
I've got a problem with my PHP code using Drupal. What I have is an array of names, which has a particular format and can contain characters as well as spaces. The name can be of two formats:

Code: Select all

$name[0] = "--> Psychic Barrtier";
$name[1] ="    Initial Presence";
In my code, I take each line in the array, and use a preg_match statement to see if it matches these two patterns. So, I am basically looking for a line that starts with two dashes and '>' this, followed by a space. Or I am looking for a line that starts with 4 spaces. This is my preg_match statement:

Code: Select all

while (preg_match('/^(--> |--> |--> | {4}|    |Â+)(.+)$/', $name[$i], $capturedname)) 
The problem is, both types of names are passing the preg_match statement, but when I encounter a name that starts with just the 4 spaces, the captured data in the variable $capturedname doesn't match up. Basically, what I'm capturing in $caputredname[0] is the whole thing, then $capturedname[1] would be just the spaces, and $capturedname[2] would be just the name. This is what I get instead:

$capturedname[0]=" Evil Presence"
$capturedname[1] = "�"
$capturedname[2] = "� Evil Presence"

Array 1 plus 2 should equal Array 0, but that's not the case here. The spaces are getting converted to some diamond with a question mark character and $capturedname[2] has the spaces again, it just doesn't add up. Any sort of help would be greatly appreciated. It's been bugging me for 3 days now. Thanks in advance.

Re: Space Characters Getting Converted in Drupal

Posted: Tue Jun 28, 2011 12:19 am
by Supershaba
Any help on this would be appreciated, thanks.

Re: Space Characters Getting Converted in Drupal

Posted: Tue Jun 28, 2011 4:10 am
by Apollo
Most likely an encoding issue.
How are the input strings (the $name array) encoded, and how do you output the result? (i.e. what encoding does your html use?)

Furthermore you have an  character in your regexp, but since php files have no standard encoding, it's a complete guess how this will be saved and interpreted (also depends on the editor you happen to use).
I assume the  is an attempt to represent the non breaking space character (U+00A0), similar to   but that's incorrect. You should either take one byte 0xA0 as a non-breaking space char (if it's ansi encoded), or two bytes 0xC2 0xA0 (for one non-breaking space) if it's utf-8 encoded.

Are you sure you want to literally catch "--> xxx" as well as "--> yyy" and "--&> zzz" ? (seems like it's not sure how many it has been htmlspecialchar'd?) And wouldn't you need to include   as well then?

Solution: make sure EXACTLY how your input is encoded: htmlspecialchar'd or not? (or twice?) utf-8 or iso-8859-1 or win-1252? Then write your expression according to that.

Re: Space Characters Getting Converted in Drupal

Posted: Tue Jun 28, 2011 7:06 am
by Supershaba
It's encoded in UTF-8, so it shouldn't be a problem, but I don't know why it is. As for the Â, I know that's an odd looking thing there, but the only way that I got it to pass the preg_match statement was if I used that character. I tried using '    ', then I used, '    ', but none of those have worked, so it's just mind boggling to me. As for trying to match: "--> xxx" as well as "--> yyy" and "--&> zzz", I have that in there, in case, the user doesn't use the editor or there might be some other differences.

Re: Space Characters Getting Converted in Drupal

Posted: Tue Jul 05, 2011 7:57 am
by Supershaba
Any help regarding this issue would be appreciated.