regexp problem

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
kaisellgren
DevNet Resident
Posts: 1675
Joined: Sat Jan 07, 2006 5:52 am
Location: Lahti, Finland.

regexp problem

Post by kaisellgren »

Hi,

It's midnight and my brains aint working :S

Code: Select all

<?php

$a = "Hosting is most certainly a business that companies/people can profit in. I'm not questioning that -- my point is that you're expecting an awful lot for the amount you pay. You wouldn't offer 1 on 1 livechat, phone support or e-mail responses in 15 minutes for what you pay your host per hour. You expect them to. That doesn't mean you're not somehow 100% pure profit to them (in fact you could be if they do things right) BUT you're paying less than nothing. Be nice to the technicians who are helping you, thank them -- and let them know you'll google next time or post on their forums instead.";
preg_match_all("/(.{0,200}\bnot\b.{0,200})/ism",$a,$matches);
foreach ($matches[1] as $key)
 {
  echo $key.'<br />';
 }

?>
This does not give me two lines of text containing both 403 chars like my regex clearly says 200 + 200 + 3 = 403 chars!
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Your regex asks for 0-200 characters. It'll typically choose zero because that always matches fastest.
User avatar
kaisellgren
DevNet Resident
Posts: 1675
Joined: Sat Jan 07, 2006 5:52 am
Location: Lahti, Finland.

Post by kaisellgren »

feyd wrote:Your regex asks for 0-200 characters. It'll typically choose zero because that always matches fastest.
How could I make it so that it outputs two as big texts as possible (in this case 403 chars total) ?

Now it splits it up like

text: abcdefghijklmnopq

result1: abcdefg
result2: hijklmnopq

so that they both are not 403 =/

EDIT:

If I change {0,200} to {200}, then I get only 1 result although I have two 'not' words in my text! :(
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Your pattern isn't going to fetch 403 characters unless the not is in the middle of a large body of text without another not in range.

Your pattern appears to be working just fine, by the way.

Code: Select all

<?php
$a = 'Hosting is most certainly a business that companies/people can profit in. I\'m not questioning that -- my point is that you\'re expecting an awful lot for the amount you pay. You wouldn\'t offer 1 on 1 livechat, phone support or e-mail responses in 15 minutes for what you pay your host per hour. You expect them to. That doesn\'t mean you\'re not somehow 100% pure profit to them (in fact you could be if they do things right) BUT you\'re paying less than nothing. Be nice to the technicians who are helping you, thank them -- and let them know you\'ll google next time or post on their forums instead.';
preg_match_all('/.{0,200}\bnot\b.{0,200}/ism',$a,$matches);
var_dump($matches);
?>
output

Code: Select all

array(1) {
  [0]=>
  array(2) {
    [0]=>
    string(281) "Hosting is most certainly a business that companies/people can profit in. I'm not questioning that -- my point is that you're expecting an awful lot for the amount you pay. You wouldn't offer 1 on 1 livechat, phone support or e-mail responses in 15 minutes for what you pay your ho"
    [1]=>
    string(261) "st per hour. You expect them to. That doesn't mean you're not somehow 100% pure profit to them (in fact you could be if they do things right) BUT you're paying less than nothing. Be nice to the technicians who are helping you, thank them -- and let them know yo"
  }
}
You don't need the "m" modifier, just so you know.
User avatar
kaisellgren
DevNet Resident
Posts: 1675
Joined: Sat Jan 07, 2006 5:52 am
Location: Lahti, Finland.

Post by kaisellgren »

feyd wrote:Your pattern isn't going to fetch 403 characters unless the not is in the middle of a large body of text without another not in range.

Your pattern appears to be working just fine, by the way.

Code: Select all

<?php
$a = 'Hosting is most certainly a business that companies/people can profit in. I\'m not questioning that -- my point is that you\'re expecting an awful lot for the amount you pay. You wouldn\'t offer 1 on 1 livechat, phone support or e-mail responses in 15 minutes for what you pay your host per hour. You expect them to. That doesn\'t mean you\'re not somehow 100% pure profit to them (in fact you could be if they do things right) BUT you\'re paying less than nothing. Be nice to the technicians who are helping you, thank them -- and let them know you\'ll google next time or post on their forums instead.';
preg_match_all('/.{0,200}\bnot\b.{0,200}/ism',$a,$matches);
var_dump($matches);
?>
output

Code: Select all

array(1) {
  [0]=>
  array(2) {
    [0]=>
    string(281) "Hosting is most certainly a business that companies/people can profit in. I'm not questioning that -- my point is that you're expecting an awful lot for the amount you pay. You wouldn't offer 1 on 1 livechat, phone support or e-mail responses in 15 minutes for what you pay your ho"
    [1]=>
    string(261) "st per hour. You expect them to. That doesn't mean you're not somehow 100% pure profit to them (in fact you could be if they do things right) BUT you're paying less than nothing. Be nice to the technicians who are helping you, thank them -- and let them know yo"
  }
}
You don't need the "m" modifier, just so you know.
It's just that is not what I need.

The first string ends "pay your ho" and the second string starts "st per hour.". So it basically divides my text :S

How could I make it not to care about other nots in a range?

Example:
A better example that not bug at all or at least I hope it not bugs.
If I want to get 32 chars around both nots then it should return:

STRING 1: A better example that not bug at all or at least I hope i
STRING 2: g at all or at least I hope it not bugs.

But now it won't make it that way, because the sentences are part of each other or something.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

It doesn't actually care about other nots being in range. If you count, there are 200 characters after the first not captured. There aren't 200 characters before the first not, so only the maximum can be captured. The same happens with the second not because the previous match ended 200 characters after the first not.

If you don't want to consider previous matches then you'll have to build a loop using preg_match() and substr()
User avatar
kaisellgren
DevNet Resident
Posts: 1675
Joined: Sat Jan 07, 2006 5:52 am
Location: Lahti, Finland.

Post by kaisellgren »

feyd wrote:It doesn't actually care about other nots being in range. If you count, there are 200 characters after the first not captured. There aren't 200 characters before the first not, so only the maximum can be captured. The same happens with the second not because the previous match ended 200 characters after the first not.

If you don't want to consider previous matches then you'll have to build a loop using preg_match() and substr()
Oh, do you mean that I make a loop that finds all positions of all nots and then with substr I get the chars around all nots using the positions?

When I do "/\bnot\b/", how do I know at what position it is located? I can only get 1 or 0 true or false if it is found...
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Use the matches argument and set one of the flags that identifies positions.
User avatar
kaisellgren
DevNet Resident
Posts: 1675
Joined: Sat Jan 07, 2006 5:52 am
Location: Lahti, Finland.

Post by kaisellgren »

feyd wrote:Use the matches argument and set one of the flags that identifies positions.
Oh PREG_OFFSET_CAPTURE worked my mistake. I had notepad2 which shows column position and I was watching the number wrong I though I got wrong position value :S

okies dokies thank you feyd you are always helpful less or more :)
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Post by GeertDD »

feyd wrote:Your regex asks for 0-200 characters. It'll typically choose zero because that always matches fastest.
I don't think so. Regular expressions are greedy by default and they will try to match as much as possible.

Code: Select all

Regex:    .{0,3}
Subject:  abcdef
Matches:  abc
Post Reply