preg_match_all returns TRUE but bad values

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
gu35t
Forum Newbie
Posts: 4
Joined: Mon Nov 22, 2010 5:32 am
Location: Gliwice,Silesia, Poland

preg_match_all returns TRUE but bad values

Post by gu35t »

hi
i ve a string - this is a javascript variable:
[text]var var0 = [ "11296710","na","21,010,200,000","20101121","20100415","20100209","X","rozne dane","na","koty","12:40 am","00:00","2910416169","Nov. 21, 2010","4.2","kol","4.2700","4.2200","251698248","0.0320","d","4.2500","4.2600","4.3000","4.2600","3.1100","5.0700","Feb. 9, 2010","Apr. 15, 2010","na","na","2.56","-0.02","-172.03","0.00","0","0.744 %","4.2","4.3000","20101119","6712","20101119","4.2600","4.2700","4.2200","0.0000","dane firmowe","na","na" ];[/text]

php code:

Code: Select all

#
$s = file_get_contents("./var.txt");
$s = trim($s);
$s = explode("\n",$s);

$patt = "/^var var([0-9]){1,2} = \[ "; // Begin of pattern
for($i=0; $i<=47; $i++){ $patt .= "\"(.*?)\","; } // Repeat pattern
$patt .= "\"(.*?)\" \]\;$/"; // The end of pattern 
echo $patt;

# 1 . not good
#
preg_match_All("/^var var([0-9]){1,2} = \[ (\"(.*?)\",){48}\"(.*?)\" \]\;$/ ", $s[0], $matches);
# 2 .  good
#
preg_match_all($patt, $s[0], $matches1);

echo "<pre>";
var_dump($s);
print_r($matches);
print_r($matches1);
echo "</pre>";
show_source(__FILE__);
#
?>
First preg_match_all with pattern - /^var var([0-9]){1,2} = \[ (\"(.*?)\",){48}\"(.*?)\" \]\;$/ - returns TRUE but bad values:
[text]Array
(
[0] => Array
(
[0] => var var0 = [ ...... ];
)

[1] => Array
(
[0] => 0
)

[2] => Array
(
[0] => "na",
)

[3] => Array
(
[0] => na
)

[4] => Array
(
[0] => na
)

)[/text]

Second preg_match_all with pattern $patt works fine - returns TRUE and all values between " ".
What is wrong with first pattern ?

thanks
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: preg_match_all returns TRUE but bad values

Post by ridgerunner »

When you place a capture group inside a repeating expression (i.e. '(\"(.*?)\",){48}'), the same capture group is re-used over and over again and when the the match is finally completed, the last value that was captured is the one that is retained. The first regex only has four capture groups, so it only captures four values.

The first capture group in your regex also suffers this problem if the number of vars exceeds nine (as-written, the capture group only gets the last digit). This expression should be written like so: 'var([0-9]{1,2})'.
gu35t
Forum Newbie
Posts: 4
Joined: Mon Nov 22, 2010 5:32 am
Location: Gliwice,Silesia, Poland

Re: preg_match_all returns TRUE but bad values

Post by gu35t »

[text]the same capture group is re-used over and over again and when the the match is finally completed,[...] so it only captures four values.[/text]
ok i understand.

Is there any way to write this expression in simpler(shorter) way than in PHP $patt variable?
i do not ve any idea how it should like if simpler expression is possible.
thanks
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: preg_match_all returns TRUE but bad values

Post by ridgerunner »

If you wish to capture all the array elements in one single operation, then your current regex is pretty good.
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: preg_match_all returns TRUE but bad values

Post by ridgerunner »

One other point is that your regex does not allow for valid strings that contain escaped double quotes. For example:
var var0 = ["He said \"WOW!\"."];

Here is your script with a better regex:

Code: Select all

<?php
$s = file_get_contents("./var.txt");
$s = trim($s);
$s = explode("\n",$s);

// Old code
$patt = "/^var var([0-9]){1,2} = \[ ";            // Begin of pattern
for($i=0; $i<=47; $i++){ $patt .= "\"(.*?)\","; } // Repeat pattern
$patt .= "\"(.*?)\" \]\;$/";                      // The end of pattern

// New code
$patt = "/^var\s+var(\d+)\s*=\s*\[\s*";                   // Begin pattern
for($i=0; $i<=47; $i++) {
  $patt .= '"([^"\\\\]*(?:\\\\.[^"\\\\]*)*)"\s*,\s*';     // Repeat pattern
}
$patt   .= '"([^"\\\\]*(?:\\\\.[^"\\\\]*)*)"\s*\]\s*;$/'; // The end of pattern
echo $patt;

preg_match_all($patt, $s[0], $matches, PREG_SET_ORDER);

echo "<pre>";
var_dump($s);
print_r($matches);
echo "</pre>";
show_source(__FILE__);
?>
Changes:
  • Added \s* to allow for variable whitespace where allowed.
  • Used 'single' quotes for regex string instead of "double" quotes.
  • changed "(.*?)" to "([^"\\]*(?:\\.[^"\\]*)*)" for efficiency and to allow for escaped chars.
  • Added PREG_SET_ORDER flag to preg_match_all() call to group all array elements into one member.
Hope this helps! :)
gu35t
Forum Newbie
Posts: 4
Joined: Mon Nov 22, 2010 5:32 am
Location: Gliwice,Silesia, Poland

Re: preg_match_all returns TRUE but bad values

Post by gu35t »

damn, what a pattern !:D.

i was asking about simpler way 'cos the next string i m going to preg_match looks like:
[text]var type = [["xxx","Sex Sex Sex","US","/sex/p0rn.html","MY"],["blah1","blah blah","DE","/blah/A0rn.html","US"], ........ ]] ; [/text]
i will try to figure it out [-;

thanks for helpful advices ridgerunner !
User avatar
ridgerunner
Forum Contributor
Posts: 214
Joined: Sun Jul 05, 2009 10:39 pm
Location: SLC, UT

Re: preg_match_all returns TRUE but bad values

Post by ridgerunner »

When asking for help, it is best to provide a complete representative example of the sample data you are working with, including all of the variations that may be encountered. i.e Does each array member always have the same number of elements? Do some records have variations in whitespace? Are the strings always double quoted or are some single quoted as well? The sample data needs to be truly representative if you want a really good solution. And If the real data has many records, please provide more than one record in your example data. In addition to representative sample data, you need to describe the final form of data you wish to extract. Arrays? Strings? Providing detailed descriptions of your input and your output helps us provide you with the help you need.

That said, it looks like your problem would be best solved using several nested loops, each with a simple regex; outer loop matches a full record, mid level loop matches outer array members which are themselves arrays, and finally an inner loop which extracts the string members of the inner arrays.
User avatar
McInfo
DevNet Resident
Posts: 1532
Joined: Wed Apr 01, 2009 1:31 pm

Re: preg_match_all returns TRUE but bad values

Post by McInfo »

In reply to the first post:

Is the goal to write a regular expression or to turn a JavaScript array string into a PHP array? If it's the latter, you might be interested in json_decode(). See also, substr(), strpos(), and strrpos().
gu35t
Forum Newbie
Posts: 4
Joined: Mon Nov 22, 2010 5:32 am
Location: Gliwice,Silesia, Poland

Re: preg_match_all returns TRUE but bad values

Post by gu35t »

If it's the latter, you might be interested in json_decode(). See also, substr(), strpos(), and strrpos()
I know these functions . Also i know that there are `easiest` way to parse these strings. But my goal is to understand regexp.

[text]var colorstype = [["RGB","Red/Green/Black","MY","/rgb.php","DATA"],["BLUE","BLUE COLOR","MY","/colors/xxx/blue.html","ANOTHER"],["RED","Red Color","MY","/colors/xxx/red.html","ANOTHER"]] ;[/text]
all data in main crochets can repeat 1k+

[text]["RGB","Red Green Black","MY","/colors/xxx/rgb.html","DATA"],[/text]
[text]
First field is always uppercase string sometimes with dot - ([A-Z\.]{1,5}) -> "RGB"
Second field is a string - [a-zA-Z0-9\./] without \" -> "Red/Green Black"
Third field is a uppercase string ([A-Z]{1,3}) -> "MY"
Fourth field is a string [a-z/\.] -> "/rgb.php"
Fifth field is a uppsercase string ([A-Z]{1,5}) -> "DATA"
The end of the string: ]\s*;$
[/text]
you need to describe the final form of data you wish to extract.
[text]
array ( 0 => [0]RGB,
[1]Red/Green Black,
[2]MY,
[3]/rgb.php,
[4]DATA
1=> [0]RED,
[1]Red Color,
[2]MY,
[3]/colors/xxx/red.html,
[4]Another
);
[/text]

greetings
Post Reply