Page 1 of 1

Parsing a complex string with preg_match_all

Posted: Thu Jul 16, 2009 12:48 am
by thegodfaza
So basically I'm developing a PHP RCon tool for use in the CoD series of games. I am limited to work with what comes out of the server. Ex: $data is the responce packet(s) from the query. That can not be changed. And the amount of whitespace between entries are subject to change as well, though there will always be at least one space. The problem that I'm having is getting player names with a space in them.

Code: Select all

<?php
$data = "map: mp_showdown
num score ping guid                             name            lastmsg address               qport rate
--- ----- ---- -------------------------------- --------------- ------- --------------------- ----- -----
10     7   95  b8a09b5924fa2e5b79a56b9ae61c0954 Player 1        0       192.168.1.1:28960     2550  25000
11    25   37  0a6945dacb246b04d2a63daf3c50a877 Player 2        5       192.168.0.1:28960     25417 25000
12    15   80  00770f2f8fac5810cf62b6f2e4f0233a Player 3        0       192.168.0.2:28960     2280  25000
13    15   74  5396333c32a8ff26713da7ff6c0bcb73 Player 4        0       192.168.0.3:-10470    7773  25000
14    32   63  eebb4bbbdb04d2fc91e7d1e78310607d Player 5        0       192.168.0.4:28960     20599 25000
15    45   92  0f931649e253088674c305315ed13409 Player 6        0       192.168.0.5:28960     22608 25000
0     20   85  1e9e1b01dc19152154566816a2188e60 Player 7        0       192.168.0.6:28960     20643 25000
1     50   55  6fdabb920b571946229c06564be0024c Player 8        15      192.168.0.7:28960     3307  25000
2      0   71  7743eb260f7a3752890ae24678874e5d Player 9        0       192.168.0.8:-15259    -23125 18000
3      0   70  d09930d1d4eb4f7f120fc44b66a56b15 Player 10       0       192.168.0.9:28960     -23970 25000
4     20   72  7ae8f7c708dc91bb935bc03e84ba6975 Player 11       0       192.168.0.10:28960    -1276 25000
5      5  157  fc3b5031ec6e420e8a35c3b1acf0cd53 Player 12       0       192.168.0.11:28960    -31404 25000
6      5   40  79b311d031f9bfaca213e40a97037d03 Player 13       30      192.168.0.12:28960    -31819 25000
7     20   80  36bc7791fbe203c89bd058449691e912 Player 14       0       192.168.0.13:28960    -613  25000
8     10  162  b898156fbc3ed5d7b21518a7bf4a80c0 Player 15       0       192.168.0.14:28960    -15579 25000
9     30  194  dddb1af2422537f200dd61665cd54265 Player 16       10      192.168.0.15:52       -23523 25000";
$players = explode ("\n", $data );
 
array_shift($players);
array_shift($players);
array_shift($players);
 
foreach( $players as $input ) {
//                     ID     Score   Ping      GUID          Name     Last Mess   IP
preg_match_all("/^\s*(\d+)\s*(\d+)\s*(\d+)\s*([0-9a-f]*)\s*(.*?\S*\s*)\s*(\d*)\s*(\d*)$/",$input,$output);
$table .= "<tr><td>";
$table .= $output[1][0];
$table .= "</td><td>";
$table .= $output[2][0];
$table .= "</td><td>";
$table .= $output[3][0];
$table .= "</td><td>";
$table .= $output[4][0];
$table .= "</td><td>";
$table .= $output[5][0];
$table .= "</td><td>";
$table .= $output[6][0];
$table .= "</td><td>";
$table .= $output[7][0];
$table .= "</td></tr>";
}
echo "<table border=\"1\"><thead><tr><td>Player Number</td><td>Score</td><td>Ping</td><td>GUID</td><td>Name</td><td>Last Message</td><td>IP Address</td></tr></thead><tbody>".$table."</tbody></table>"
?>

Re: Allowing some whitespace and not others

Posted: Thu Jul 16, 2009 12:52 am
by prometheuzz
What exactly are you trying to accomplish?

Re: Allowing some whitespace and not others

Posted: Thu Jul 16, 2009 12:56 am
by thegodfaza
I thought it was obvious. I'm trying to parse a complex string with preg_match_all to get the individual values in it.

Re: Allowing some whitespace and not others

Posted: Thu Jul 16, 2009 1:04 am
by prometheuzz
NVM

Re: Parsing a complex string with preg_match_all

Posted: Thu Jul 16, 2009 1:15 am
by thegodfaza
I'm trying to get a table

Code: Select all

echo "<table border=\"1\"><thead><tr><td>Player Number</td><td>Score</td><td>Ping</td><td>GUID</td><td>Name</td><td>Last Message</td><td>IP Address</td></tr></thead><tbody>".$table."</tbody></table>"
?>
With the individual values in their own cells.

Code: Select all

//                     ID     Score   Ping      GUID          Name     Last Mess   IP
preg_match_all("/^\s*(\d+)\s*(\d+)\s*(\d+)\s*([0-9a-f]*)\s*(.*?\S*\s*)\s*(\d*)\s*(\d*)$/",$input,$output);
$table .= "<tr><td>";
$table .= $output[1][0];
$table .= "</td><td>";
$table .= $output[2][0];
$table .= "</td><td>";
$table .= $output[3][0];
$table .= "</td><td>";
$table .= $output[4][0];
$table .= "</td><td>";
$table .= $output[5][0];
$table .= "</td><td>";
$table .= $output[6][0];
$table .= "</td><td>";
$table .= $output[7][0];
$table .= "</td></tr>";
But thats only for readability. It should be outputting an array with the individual values in it:

Code: Select all

array[0]
   [0] => "10 7 95 b8a09b5924fa2e5b79a56b9ae61c0954 Player 1 0 192.168.1.1:28960"
   [1] => "10"
   [2] => "7"
   [3] => "95"
   [4] => "b8a09b5924fa2e5b79a56b9ae61c0954"
   [5] => "Player 1"
   [6] => "0"
   [7] => "192.168.1.1:28960"
What I'm getting back is:

Code: Select all

array[0]
   [0] => "10 7 95 b8a09b5924fa2e5b79a56b9ae61c0954 Player 1 0 192.168.1.1:28960 2550 25000"
   [1] => "10"
   [2] => "7"
   [3] => "95"
   [4] => "b8a09b5924fa2e5b79a56b9ae61c0954"
   [5] => "Player 1 0 192.168.1.1:28960 "
   [6] => "2550"
   [7] => "25000"

Re: Parsing a complex string with preg_match_all

Posted: Thu Jul 16, 2009 4:12 am
by prometheuzz
Okay, that makes it a bit clearer.
Now about the user name. Before coding (or crafting a regex), you should have a clear picture of what a user name "could" be. Your sole example ("Payer ??") is probably not the format the 'name' will be each time. So a couple of questions:
1 - is there always one space in the user name, or can there be no spaces or more than one space?
2 - does the user name always begin with "Player"?
3 - if question 2 is answered with a "no", what are the valid characters a user name can consist of?

Re: Parsing a complex string with preg_match_all

Posted: Thu Jul 16, 2009 3:32 pm
by thegodfaza
The player names can consist of about any ASCII character(there are more allowed than not) with an array of no spaces, one space, or multiple spaces(or even multiple spaces in a row).

EDIT: NVM, I have a solution. I'm not sure if I should post it since it was on another forum.

Re: Parsing a complex string with preg_match_all

Posted: Thu Jul 16, 2009 11:59 pm
by ridgerunner
Your regex has a lot of greedy stars which could result in catastrophic backtracking if any line is not well formed. You want to use a + rather than a * anyway, because you know something must be in each field. Also, if the data file has only spaces between the fields, then use only spaces in the regex and don't use the \s because it can match the end of lines. To get the job done efficiently, liberally use possessive quantifiers on everything but the name, which appears to be the only "variable" element which may contain multiple spaces. Also, a negative lookbehind after the "name" capture prevents it from grabbing any trailing spaces. And finally, adding optional minus sign if front of numeric fields makes sense. Here is a regex that seems to do the trick quite nicely...

Code: Select all

'/^(-?\d++)[ ]++(-?\d++)[ ]++(-?\d++)[ ]++([0-9a-f]++)[ ]++(.+)(?<![ ])[ ]++(-?\d++)[ ]++([-\d.:]++)[ ]++(-?\d++)[ ]++(-?\d++)$/im'
Or better yet, for readability, why not go ahead and use named capture groups and make it free-spacing...

Code: Select all

'/^
(?P<num>-?\d++)
[ ]++(?P<score>-?\d++)
[ ]++(?P<ping>-?\d++)
[ ]++(?P<guid>[0-9a-f]++)
[ ]++(?P<name>.+)(?<![ ])
[ ]++(?P<lastmsg>-?\d++)
[ ]++(?P<address>[-\d.:]++)
[ ]++(?P<qport>-?\d++)
[ ]++(?P<rate>-?\d++)$
/imx'

Re: Parsing a complex string with preg_match_all

Posted: Fri Jul 17, 2009 12:12 am
by thegodfaza
I ended up using this.

Code: Select all

#^\s*(\d+)\s+(\d+)\s+(\d+)\s+([0-9a-f]{32})\s+((?!\s{2,}).{1,15})\s{2,}(\d+)\s+([^\s]+)#m
It only shows correct entries since I am receiving the packets over UDP which puts them out of order more often than not. It needed to be able to ONLY show entries which aren't malformed and exclude everything else.

Re: Parsing a complex string with preg_match_all

Posted: Fri Jul 17, 2009 9:29 am
by ridgerunner
thegodfaza wrote:I ended up using this.

Code: Select all

#^\s*(\d+)\s+(\d+)\s+(\d+)\s+([0-9a-f]{32})\s+((?!\s{2,}).{1,15})\s{2,}(\d+)\s+([^\s]+)#m
It only shows correct entries since I am receiving the packets over UDP which puts them out of order more often than not. It needed to be able to ONLY show entries which aren't malformed and exclude everything else.
This one is better than your first but still grabs trailing spaces into the captured name field. And the negative lookahead within the name capture is effectively doing nothing at all. (It appears to be trying (but failing) to avoid occurrences of 2 or more spaces within a name.)

If you don't want trailing spaces in your name field, you could remove the useless negative lookahead within the name capture and add a negative lookbehind right after like so:

Code: Select all

'#^\s*(\d+)\s+(\d+)\s+(\d+)\s+([0-9a-f]{32})\s+(.{1,15})(?<!\s)\s+(\d+)\s+([^\s]+)#m'
The regex I presented in my previous post does match only well formed entries - and it fails to match malformed lines very quickly i.e. it is not susceptible to catastrophic backtracking. It correctly captures a name without leading or trailing spaces that can contain any number of embedded spaces.