Page 1 of 1

regexp problem

Posted: Wed May 30, 2007 11:58 am
by kaisellgren
Hi,

It's midnight and my brains aint working :S

Code: Select all

<?php

$a = "Hosting is most certainly a business that companies/people can profit in. I'm not questioning that -- my point is that you're expecting an awful lot for the amount you pay. You wouldn't offer 1 on 1 livechat, phone support or e-mail responses in 15 minutes for what you pay your host per hour. You expect them to. That doesn't mean you're not somehow 100% pure profit to them (in fact you could be if they do things right) BUT you're paying less than nothing. Be nice to the technicians who are helping you, thank them -- and let them know you'll google next time or post on their forums instead.";
preg_match_all("/(.{0,200}\bnot\b.{0,200})/ism",$a,$matches);
foreach ($matches[1] as $key)
 {
  echo $key.'<br />';
 }

?>
This does not give me two lines of text containing both 403 chars like my regex clearly says 200 + 200 + 3 = 403 chars!

Posted: Wed May 30, 2007 12:07 pm
by feyd
Your regex asks for 0-200 characters. It'll typically choose zero because that always matches fastest.

Posted: Wed May 30, 2007 12:12 pm
by kaisellgren
feyd wrote:Your regex asks for 0-200 characters. It'll typically choose zero because that always matches fastest.
How could I make it so that it outputs two as big texts as possible (in this case 403 chars total) ?

Now it splits it up like

text: abcdefghijklmnopq

result1: abcdefg
result2: hijklmnopq

so that they both are not 403 =/

EDIT:

If I change {0,200} to {200}, then I get only 1 result although I have two 'not' words in my text! :(

Posted: Wed May 30, 2007 12:30 pm
by feyd
Your pattern isn't going to fetch 403 characters unless the not is in the middle of a large body of text without another not in range.

Your pattern appears to be working just fine, by the way.

Code: Select all

<?php
$a = 'Hosting is most certainly a business that companies/people can profit in. I\'m not questioning that -- my point is that you\'re expecting an awful lot for the amount you pay. You wouldn\'t offer 1 on 1 livechat, phone support or e-mail responses in 15 minutes for what you pay your host per hour. You expect them to. That doesn\'t mean you\'re not somehow 100% pure profit to them (in fact you could be if they do things right) BUT you\'re paying less than nothing. Be nice to the technicians who are helping you, thank them -- and let them know you\'ll google next time or post on their forums instead.';
preg_match_all('/.{0,200}\bnot\b.{0,200}/ism',$a,$matches);
var_dump($matches);
?>
output

Code: Select all

array(1) {
  [0]=>
  array(2) {
    [0]=>
    string(281) "Hosting is most certainly a business that companies/people can profit in. I'm not questioning that -- my point is that you're expecting an awful lot for the amount you pay. You wouldn't offer 1 on 1 livechat, phone support or e-mail responses in 15 minutes for what you pay your ho"
    [1]=>
    string(261) "st per hour. You expect them to. That doesn't mean you're not somehow 100% pure profit to them (in fact you could be if they do things right) BUT you're paying less than nothing. Be nice to the technicians who are helping you, thank them -- and let them know yo"
  }
}
You don't need the "m" modifier, just so you know.

Posted: Wed May 30, 2007 12:36 pm
by kaisellgren
feyd wrote:Your pattern isn't going to fetch 403 characters unless the not is in the middle of a large body of text without another not in range.

Your pattern appears to be working just fine, by the way.

Code: Select all

<?php
$a = 'Hosting is most certainly a business that companies/people can profit in. I\'m not questioning that -- my point is that you\'re expecting an awful lot for the amount you pay. You wouldn\'t offer 1 on 1 livechat, phone support or e-mail responses in 15 minutes for what you pay your host per hour. You expect them to. That doesn\'t mean you\'re not somehow 100% pure profit to them (in fact you could be if they do things right) BUT you\'re paying less than nothing. Be nice to the technicians who are helping you, thank them -- and let them know you\'ll google next time or post on their forums instead.';
preg_match_all('/.{0,200}\bnot\b.{0,200}/ism',$a,$matches);
var_dump($matches);
?>
output

Code: Select all

array(1) {
  [0]=>
  array(2) {
    [0]=>
    string(281) "Hosting is most certainly a business that companies/people can profit in. I'm not questioning that -- my point is that you're expecting an awful lot for the amount you pay. You wouldn't offer 1 on 1 livechat, phone support or e-mail responses in 15 minutes for what you pay your ho"
    [1]=>
    string(261) "st per hour. You expect them to. That doesn't mean you're not somehow 100% pure profit to them (in fact you could be if they do things right) BUT you're paying less than nothing. Be nice to the technicians who are helping you, thank them -- and let them know yo"
  }
}
You don't need the "m" modifier, just so you know.
It's just that is not what I need.

The first string ends "pay your ho" and the second string starts "st per hour.". So it basically divides my text :S

How could I make it not to care about other nots in a range?

Example:
A better example that not bug at all or at least I hope it not bugs.
If I want to get 32 chars around both nots then it should return:

STRING 1: A better example that not bug at all or at least I hope i
STRING 2: g at all or at least I hope it not bugs.

But now it won't make it that way, because the sentences are part of each other or something.

Posted: Wed May 30, 2007 12:42 pm
by feyd
It doesn't actually care about other nots being in range. If you count, there are 200 characters after the first not captured. There aren't 200 characters before the first not, so only the maximum can be captured. The same happens with the second not because the previous match ended 200 characters after the first not.

If you don't want to consider previous matches then you'll have to build a loop using preg_match() and substr()

Posted: Wed May 30, 2007 1:00 pm
by kaisellgren
feyd wrote:It doesn't actually care about other nots being in range. If you count, there are 200 characters after the first not captured. There aren't 200 characters before the first not, so only the maximum can be captured. The same happens with the second not because the previous match ended 200 characters after the first not.

If you don't want to consider previous matches then you'll have to build a loop using preg_match() and substr()
Oh, do you mean that I make a loop that finds all positions of all nots and then with substr I get the chars around all nots using the positions?

When I do "/\bnot\b/", how do I know at what position it is located? I can only get 1 or 0 true or false if it is found...

Posted: Wed May 30, 2007 1:11 pm
by feyd
Use the matches argument and set one of the flags that identifies positions.

Posted: Wed May 30, 2007 1:16 pm
by kaisellgren
feyd wrote:Use the matches argument and set one of the flags that identifies positions.
Oh PREG_OFFSET_CAPTURE worked my mistake. I had notepad2 which shows column position and I was watching the number wrong I though I got wrong position value :S

okies dokies thank you feyd you are always helpful less or more :)

Posted: Tue Jun 12, 2007 12:58 pm
by GeertDD
feyd wrote:Your regex asks for 0-200 characters. It'll typically choose zero because that always matches fastest.
I don't think so. Regular expressions are greedy by default and they will try to match as much as possible.

Code: Select all

Regex:    .{0,3}
Subject:  abcdef
Matches:  abc