Stuck on regex. Could use some help.
Moderator: General Moderators
-
northstar7
- Forum Newbie
- Posts: 6
- Joined: Tue Mar 23, 2010 7:15 pm
Stuck on regex. Could use some help.
I'm trying to develop a regex expression that will parse the following phrases.
$text ="[view:news_events_location_listing=page_9=2-miles=items:5]";
$text ="[view:news_events_location_listing=page_9=2-miles]";
The regex has to parse the two strings and load them into an array as follows:
Array [0][0] is [view:news_events_location_listing=page_9=2-miles=items:5]
Array [1][0] is news_events_location_listing
Array [2][0] is page_9
Array [3][0] is 2-miles
Array [4][0] is 5
This works fine for the first string:
preg_match_all("/\[view:([^=\]]+)=?([^=\]]+)?=?([^\]]*)?=items:=?([0-9])\]/i",$text, $match);
But I can't figure out how to write it so that it will produce the same output from the shorter $text string. I want to skip the Array [4][0] entirely in the case of the shorter string so I get this:
Array [0][0] is [view:news_events_location_listing=page_9=2-miles=items:5]
Array [1][0] is news_events_location_listing
Array [2][0] is page_9
Array [3][0] is 2-miles
I'd be very grateful for any help on this.
Thanx
$text ="[view:news_events_location_listing=page_9=2-miles=items:5]";
$text ="[view:news_events_location_listing=page_9=2-miles]";
The regex has to parse the two strings and load them into an array as follows:
Array [0][0] is [view:news_events_location_listing=page_9=2-miles=items:5]
Array [1][0] is news_events_location_listing
Array [2][0] is page_9
Array [3][0] is 2-miles
Array [4][0] is 5
This works fine for the first string:
preg_match_all("/\[view:([^=\]]+)=?([^=\]]+)?=?([^\]]*)?=items:=?([0-9])\]/i",$text, $match);
But I can't figure out how to write it so that it will produce the same output from the shorter $text string. I want to skip the Array [4][0] entirely in the case of the shorter string so I get this:
Array [0][0] is [view:news_events_location_listing=page_9=2-miles=items:5]
Array [1][0] is news_events_location_listing
Array [2][0] is page_9
Array [3][0] is 2-miles
I'd be very grateful for any help on this.
Thanx
- ridgerunner
- Forum Contributor
- Posts: 214
- Joined: Sun Jul 05, 2009 10:39 pm
- Location: SLC, UT
Re: Stuck on regex. Could use some help.
Try this one:
It matches the very limited test data you have provided. My hunch is that your actual data has more variability which will need to be accounted for.

Code: Select all
preg_match_all('/\[view:([^=\]]+)=([^=\]]+)=([^=\]]+)(?:=items:([0-9]+))?\]/i', $contents, $matches);-
northstar7
- Forum Newbie
- Posts: 6
- Joined: Tue Mar 23, 2010 7:15 pm
Re: Stuck on regex. Could use some help.
Hi. That's really fantastic. Thank you very much.
Actually, I think the incoming strings will be as uniform as I outlined.
I see you changed from this:
preg_match_all("/\[view:([^=\]]+)=?([^=\]]+)?=?([^\]]*)?=items:=?([0-9])\]/i",$text, $match);
to this:
preg_match_all('/\[view:([^=\]]+)=([^=\]]+)=([^=\]]+)(?:=items:([0-9]+))?\]/i', $text, $match);
I'd be grateful if you could explain the effective different between what I had:
=items:=?([0-9])\]
and what you wrote:
:=items:([0-9]+))?\]
Thanks a lot.
Actually, I think the incoming strings will be as uniform as I outlined.
I see you changed from this:
preg_match_all("/\[view:([^=\]]+)=?([^=\]]+)?=?([^\]]*)?=items:=?([0-9])\]/i",$text, $match);
to this:
preg_match_all('/\[view:([^=\]]+)=([^=\]]+)=([^=\]]+)(?:=items:([0-9]+))?\]/i', $text, $match);
I'd be grateful if you could explain the effective different between what I had:
=items:=?([0-9])\]
and what you wrote:
:=items:([0-9]+))?\]
Thanks a lot.
- ridgerunner
- Forum Contributor
- Posts: 214
- Joined: Sun Jul 05, 2009 10:39 pm
- Location: SLC, UT
Re: Stuck on regex. Could use some help.
Actually, you cut off the beginning of the non-capturing parenthesis which I added to the last section (to make it optional). To explain, here is the whole thing in free-spacing long form with comments...northstar7 wrote:... I'd be grateful if you could explain the effective different between what I had:
=items:=?([0-9])\]
and what you wrote:
:=items:([0-9]+))?\]
Thanks a lot.
Code: Select all
$re = '/
\[view: # match opening literal text
( [^=\]]+ ) # capture "news_events_location_listing" into group 1
= # match literal =
( [^=\]]+ ) # capture "page_9" into group 2
= # match literal =
( [^=\]]+ ) # capture "2-miles" into group 3
(?: # begin non-capture group (to apply ? quantifier)
=items: # match literal beginning of items part
( [0-9]+ ) # capture items digit(s) into group 4
)? # end non-capture group and make it optional
\] # match closing literal text
/ix';
preg_match_all($re, $contents, $matches);Your regex: '=items:=?([0-9])\]' has the '?' quantifier applied to the equals sign which makes that one character optional, however, everything else in that sub-expression is still required to match. Do you see the difference now?
Hope this helps
-
northstar7
- Forum Newbie
- Posts: 6
- Joined: Tue Mar 23, 2010 7:15 pm
Re: Stuck on regex. Could use some help.
Yes, that helps a lot. I've looked at a lot of online tutorials and guides, but you provided the clearest explanation of what's going on in a regex that I've seen.
You should write a book.
Thanks again
You should write a book.
Thanks again
-
northstar7
- Forum Newbie
- Posts: 6
- Joined: Tue Mar 23, 2010 7:15 pm
Re: Stuck on regex. Could use some help.
It's surprising to me, but not to you, I'm sure, that the people I've been working with on this regex just emailed me about it.
To quote your earlier post:
"My hunch is that your actual data has more variability which will need to be accounted for."
And that's exactly what they said! I tried not to flame in my reply but I did ask how they expected a regex to work if they didn't fully define the range of input. It turns out that they want all of the backreferences to be optional -- not too bad -- but all I said about the incoming strings being uniform was completely wrong.
When I get the new information about the new strings I'll use your free-spacing long form and see how close I can get to a comprehensive regex. I have a suspicion I'll be back on here looking for your advice!
Thanks again
To quote your earlier post:
"My hunch is that your actual data has more variability which will need to be accounted for."
And that's exactly what they said! I tried not to flame in my reply but I did ask how they expected a regex to work if they didn't fully define the range of input. It turns out that they want all of the backreferences to be optional -- not too bad -- but all I said about the incoming strings being uniform was completely wrong.
When I get the new information about the new strings I'll use your free-spacing long form and see how close I can get to a comprehensive regex. I have a suspicion I'll be back on here looking for your advice!
Thanks again
-
northstar7
- Forum Newbie
- Posts: 6
- Joined: Tue Mar 23, 2010 7:15 pm
Re: Stuck on regex. Could use some help.
Hi. I don't know if anyone is still reading this thread, but I've got one last item to fit in.
Specifically, I have the use cases for this Regex:
1. [view:view_name]
2. [view:view_name=view_display]
3. [view:view_name=arguments]
4. [view:view_name=items:2]
5. [view:view_name=arguments=items:2]
6. [view:view_name=view_display=items:2]
7. [view:view_name=view_display=arguments]
8. [view:view_name=view_display=arguments=items:2]
For each of them I need to capture each phrase after the first "[view:" into a backreference. The different phrases are separated by = signs. As before, for the items:2 phrase I need to backcapture (is that a word?) only the numeral.
I have been playing around with it and I came up with
\[view:([^=\]]+)?=?([^items=\]]+)??=?([^items=\]]+)?(?:=items:([0-9]+))?\]
The ^items needs to be added so that it's not captured when there are only two or three phrases to be captured. Unfortunately, using it the way I have it knocks out several of the phrases I want to keep. For example, this won't work with case 3, 5, 6, 7.
I thought that ?!items:^=\] might work, but it doesn't.
Do you have any thoughts about how I can do this?
Thanks again.
Specifically, I have the use cases for this Regex:
1. [view:view_name]
2. [view:view_name=view_display]
3. [view:view_name=arguments]
4. [view:view_name=items:2]
5. [view:view_name=arguments=items:2]
6. [view:view_name=view_display=items:2]
7. [view:view_name=view_display=arguments]
8. [view:view_name=view_display=arguments=items:2]
For each of them I need to capture each phrase after the first "[view:" into a backreference. The different phrases are separated by = signs. As before, for the items:2 phrase I need to backcapture (is that a word?) only the numeral.
I have been playing around with it and I came up with
\[view:([^=\]]+)?=?([^items=\]]+)??=?([^items=\]]+)?(?:=items:([0-9]+))?\]
The ^items needs to be added so that it's not captured when there are only two or three phrases to be captured. Unfortunately, using it the way I have it knocks out several of the phrases I want to keep. For example, this won't work with case 3, 5, 6, 7.
I thought that ?!items:^=\] might work, but it doesn't.
Do you have any thoughts about how I can do this?
Thanks again.
- ridgerunner
- Forum Contributor
- Posts: 214
- Joined: Sun Jul 05, 2009 10:39 pm
- Location: SLC, UT
Re: Stuck on regex. Could use some help.
Its getting a bit too complicated to handle with a single regex. Here's how I would handle it...
Hope this helps! 
Code: Select all
<?php
// regex to split data string
$re = '/
^\[view:view_name # split on either starting literal text
| # or...
= # an equals sign param separator
| # or...
\]$ # the ending literal text
/x';
// test data (array of strings)
$data = array(
'[view:view_name]',
'[view:view_name=view_display]',
'[view:view_name=arguments]',
'[view:view_name=items:2]',
'[view:view_name=arguments=items:2]',
'[view:view_name=view_display=items:2]',
'[view:view_name=view_display=arguments]',
'[view:view_name=view_display=arguments=items:2]');
$ndata = count($data);
$results = array();
for ($i = 0; $i < $ndata; $i++) {
// handle each input data string by splitting up
$results = preg_split($re, $data[$i], -1, PREG_SPLIT_NO_EMPTY);
$nresults = count($results);
echo(sprintf("\nData set number %d has %d parameters:\n", $i + 1, $nresults));
for ($j = 0; $j < $nresults; $j++) {
// handle each parameter within this data string
echo(sprintf(" param[%d] = \"%s\"", $j + 1, $results[$j]));
if (preg_match('/^items:(\d+)$/', $results[$j], $matches)) {
echo(sprintf(" (Note: this param has count = %d)\n", $matches[1]));
} else {
echo("\n");
}
}
}
?>
Last edited by ridgerunner on Sat Apr 17, 2010 5:43 pm, edited 1 time in total.
-
northstar7
- Forum Newbie
- Posts: 6
- Joined: Tue Mar 23, 2010 7:15 pm
Re: Stuck on regex. Could use some help.
Hi, Ridgerunner. As usual, thanks for all your help.
I ran your script and got the following result (after throwing in a <br/>).
Data set number 1 has 0 parameters:
Data set number 2 has 1 parameters:
param[1] = "view_display" Data set number 3 has 1 parameters:
param[1] = "arguments" Data set number 4 has 1 parameters:
param[1] = "items:2" (Note: this param has count = 2) Data set number 5 has 2 parameters:
param[1] = "arguments" param[2] = "items:2" (Note: this param has count = 2) Data set number 6 has 2 parameters:
param[1] = "view_display" param[2] = "items:2" (Note: this param has count = 2) Data set number 7 has 2 parameters:
param[1] = "view_display" param[2] = "arguments" Data set number 8 has 3 parameters:
param[1] = "view_display" param[2] = "arguments" param[3] = "items:2" (Note: this param has count = 2)
It looks like just what I need except for one sticking point: in lines 5, 6, 7 and 9 the result includes "items:2" when the desired result is "2". I looked at your code and I still have no idea how to exclude the script from picking up "items:".
Sorry to keep coming back to you on this.
Thanks
I ran your script and got the following result (after throwing in a <br/>).
Data set number 1 has 0 parameters:
Data set number 2 has 1 parameters:
param[1] = "view_display" Data set number 3 has 1 parameters:
param[1] = "arguments" Data set number 4 has 1 parameters:
param[1] = "items:2" (Note: this param has count = 2) Data set number 5 has 2 parameters:
param[1] = "arguments" param[2] = "items:2" (Note: this param has count = 2) Data set number 6 has 2 parameters:
param[1] = "view_display" param[2] = "items:2" (Note: this param has count = 2) Data set number 7 has 2 parameters:
param[1] = "view_display" param[2] = "arguments" Data set number 8 has 3 parameters:
param[1] = "view_display" param[2] = "arguments" param[3] = "items:2" (Note: this param has count = 2)
It looks like just what I need except for one sticking point: in lines 5, 6, 7 and 9 the result includes "items:2" when the desired result is "2". I looked at your code and I still have no idea how to exclude the script from picking up "items:".
Sorry to keep coming back to you on this.
Thanks
- ridgerunner
- Forum Contributor
- Posts: 214
- Joined: Sun Jul 05, 2009 10:39 pm
- Location: SLC, UT
Re: Stuck on regex. Could use some help.
Previous scripts were designed to be run from the command line, but I guess you are running this through a webserver. Here is another version that strips out the "items:" part (if its there): The output is now formatted in HTML so it should look ok in a browser...
Code: Select all
<?php
// regex to split data string
$re = '/
^\[view:view_name # split on either starting literal text
| # or...
=(?:items:)? # an equals sign (with "items:" if there)
| # or...
\]$ # the ending literal text
/x';
// test data (array of strings)
$data = array(
'[view:view_name]',
'[view:view_name=view_display]',
'[view:view_name=arguments]',
'[view:view_name=items:2]',
'[view:view_name=arguments=items:2]',
'[view:view_name=view_display=items:2]',
'[view:view_name=view_display=arguments]',
'[view:view_name=view_display=arguments=items:2]');
$ndata = count($data);
$results = array();
echo("<html><head><title>test.php</title>\n" .
"<style type=\"text/css\" media=\"all\">\n" .
"\tbody {margin: 2em; color:#333; background:#DDB; font-family: monospace;}\n" .
"\tdd {white-space: pre;}\n" .
"</style></head><body>\n");
echo(sprintf("<h1>test.php - %d data sets</h1>\n", $ndata));
for ($i = 0; $i < $ndata; $i++) {
// handle each input data string by splitting up
$results = preg_split($re, $data[$i], -1, PREG_SPLIT_NO_EMPTY);
$nresults = count($results);
echo(sprintf("<dl>\t<dt>Data set number %d has %d parameters:</dt>\n", $i + 1, $nresults));
echo(sprintf( "\t<dd>string = \"%s\"</dd>\n", $data[$i]));
for ($j = 0; $j < $nresults; $j++) {
// handle each parameter within this data string
echo(sprintf("\t<dd>param[%d] = \"%s\"</dd>\n", $j + 1, $results[$j]));
}
echo("</dl>\n");
}
echo("</body></html>\n");
?>