preg_replace seg faulting on too large of string

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
future_man
Forum Newbie
Posts: 8
Joined: Wed Jun 25, 2003 6:08 pm

preg_replace seg faulting on too large of string

Post by future_man »

I've run into an issue on both Linux and OS X using PHP 5.0.4 where I get a segmentation fault when preg_replace tries to process too long of a string. The breaking point is around 14000 chars for my Linux system and around 20000 for my OS X box.

Here's the preg_replace:

Code: Select all

$str = preg_replace("/@@\[((.|\n|\r\n)*?)\]@@/e","strlen('$1')-(substr_count('$1','\"')+1)",$str);
The basic idea here is that I want to get a char count for everything between @@[ and ]@@ and replace that portion of the string with the count. It works fine on smaller strings, but breaks on sizes larger than those mentioned above.

Has anyone else run into anything like before? Can anyone suggest a relatively simple work around?

Here's an example of a shorter string that works:

Code: Select all

@@ї ]|ї@dtї<p>Bank of Scotland (Ireland) brings commercial banking services from the Scottish Highlands to the Emerald Isle. A member of HBOS, the bank provides a range of business banking services, including commercial loans, commercial and asset finance, and investment services. The bank operates from its headquarters in Dublin and regional offices in Limerick, Galway, Waterford, Cork, and Belfast. hen preg_replace tries to process too long of a string.  The breaking point is around 14000 chars for my Linux system and around 20000 for my OS X box.

Here's the preg_replace: 

[php]$str = preg_replace("/@@\[((.|\n|\r\n)*?)\]@@/e","strlen('$1')-(substr_count('$1','\"')+1)",$str);[/php]

The basic idea here is that I want to get a char count for everything between @@[ and ]@@ and replace that portion of the string with the count.  It works fine on smaller strings, but breaks on sizes larger than those mentioned above. 

Has anyone else run into anything like before?  Can anyone suggest a relatively simple work around?

Here's an example of a shorter string that works:

[code]@@ї ]|ї@dtї<p>Bank of Scotland (Ireland) brings commercial banking services from the Scottish Highlands to the Emerald Isle. A member of HBOS, the bank provides a range of business banking services, including commercial loans, commercial and asset finance, and investment services. The bank operates from its headquarters in Dublin and regional offices in Limerick, Galway, Waterford, Cork, and Belfast. It has also been expanding operations as it aims to become the country's No. 1 bank. It agreed in early 2005 to purchase ESB's retail business, which includes 54 branches.</p>]dt@]|ї]@@[/code]

The longer string that br$str = preg_replace("/@@\[((.|\n|\r\n)*?)\]@@/e","strlen('$1')-(substr_count('$1','\"')+1)",$str);[/php]

The basic idea here is that I want to get a char count for everything between @@[ and ]@@ and replace that portion of the string with the count.  It works fine on smaller strings, but breaks on sizes larger than those mentioned above. 

Has anyone else run into anything like before?  Can anyone suggest a relatively simple work around?

Here's an example of a shorter string that works:

[code]@@ї ]|ї@dtї<p>Bank of Scotland (Ireland) brings commercial banking services from the Scottish Highlands to the Emerald Isle. A member of HBOS, the bank provides a range of business banking services, including commercial loans, commercial and asset finance, and investment services. The bank operates from its headquarters in Dublin and regional offices in Limerick, Galway, Waterford, Cork, and Belfast. It has also been expanding operations as it aims to become the country's No. 1 bank. It agreed in early 2005 to purchase ESB's retail business, which includes 54 branches.</p>]dt@]|ї]@@
The longer string that breaks the code is too large to post here, but is basically the same in format.

Any help would be appreciated.

Thanks,
John
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

I too have had segfaults when processing very long strings with preg... (which I use lots).

It seems to differ depending upon the pattern in question and I haven't found a workaround :(

I was going to suggest re-thinking that pattern but other than adding (?: ) to your inner parens I don't see anything you can/should change. Unless of course, you've added the \n and \r\n purely to allow whitespace (i.e. newlines). If you, you might wanna just swicth to a dot "." and add the "s" modifier along with your "e".

You know, come to think of it, my segfaults happened when including \r\n is the pattern :?
future_man
Forum Newbie
Posts: 8
Joined: Wed Jun 25, 2003 6:08 pm

Post by future_man »

Thanks for the feedback. Its somewhat comforting to know that I'm not the only one running into this issue.

I tried just matching with a "." and the "s" modifier, but I still get the seg fault.

When I just use the "." without the "s" modifier, there's no seg fault, but then it does not return the desired output either.

John
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Clutching at straws here but does this make it put it's pants back on? ...

Code: Select all

$str = preg_replace("/@@\[((?:.|\s)+?)\]@@/e","strlen('$1')-(substr_count('$1','\"')+1)",$str);
I guess it wont but like I say; clutching at straws.
$str = preg_replace("/@@\[((?:.|\s)+?)\]@@/e","strlen('$1')-(substr_count('$1','\"')+1)",$str);


I guess it wont but like I say; clutching at straws.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Moved to Regex (An interesting one for this forum).

EDIT | Sorry, I have a feeling your responding to a post when i moved this in which case you'll have had an error trying to submit :twisted:
future_man
Forum Newbie
Posts: 8
Joined: Wed Jun 25, 2003 6:08 pm

Post by future_man »

Thanks for the suggestion, but I still get the segfault with modified code.

Hmmm....

John
Post Reply