preg_split on first occurence of whitespace

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
snappca
Forum Newbie
Posts: 5
Joined: Sat Jan 27, 2007 7:32 am

preg_split on first occurence of whitespace

Post by snappca »

I am desperately trying to split a string into two pieces at the first occurrence of a whitespace character (either a space or a \t), here's an example:

$fruit = preg_split('/\s/U', 'apple orange banana grape');

I'd like to have the pgrep_split give me an array populated like this:

$fruit[0] == 'apple';
$fruit[1] == 'orange banana grape';

Instead I keep getting an array split on every whitespace character. As you can see I've tried to set the regex as "ungreedy" using the "U" character. What am I missing....is there some other way that makes better sense?

Thanks in advance
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: preg_split on first occurence of whitespace

Post by John Cartwright »

Code: Select all

preg_match('/([^\s]+)(.*?)/i', 'apple orange banana grape', $fruit);
Untested..

Explanation: Match everything up the the first whitespace, then match everything else.
snappca
Forum Newbie
Posts: 5
Joined: Sat Jan 27, 2007 7:32 am

Re: preg_split on first occurence of whitespace

Post by snappca »

Had to change the regex slightly, really it was just that the ? was throwing it off. In any case I appreciate the help. Thanks

ps - here's what I changed it to if anyone else cares:

preg_match('/([^\s]+)(.*)/', 'apple orange banana grape', $fruit);
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Re: preg_split on first occurence of whitespace

Post by GeertDD »

Here's another update:

Code: Select all

 
preg_match('/^(\S++)(.*)/', 'apple orange banana grape', $fruit);
 
  • \S is shorter and means the same as [^\s].
  • Added ^ to anchor the regex to the beginning of the string to prevent needless backtracking.
  • Made \S match possessively (using ++). This kills possible needsless backtracking.
Finally, you can also use preg_split(). Just supply a limit (3rd parameter).

Code: Select all

 
preg_split('/\s+/', 'apple orange banana grape', 2);
 
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: preg_split on first occurence of whitespace

Post by John Cartwright »

GeertDD wrote:Here's another update:

Code: Select all

 
preg_match('/^(\S++)(.*)/', 'apple orange banana grape', $fruit);
 
  • \S is shorter and means the same as [^\s].
  • Added ^ to anchor the regex to the beginning of the string to prevent needless backtracking.
  • Made \S match possessively (using ++). This kills possible needsless backtracking.
Finally, you can also use preg_split(). Just supply a limit (3rd parameter).

Code: Select all

 
preg_split('/\s+/', 'apple orange banana grape', 2);
 
Truly the king of regex. :drunk:
joeaston
Forum Newbie
Posts: 3
Joined: Wed Mar 28, 2007 2:45 pm

Re: preg_split on first occurence of whitespace AND hyphen

Post by joeaston »

I am trying to achieve the same thing but with matching whitespace+hyphen+whitespace.

I've tried many variations on the following code, but I can't get it to work:

Code: Select all

 
$song = 'Explosions in the Sky – Day Four';
 
$matches = preg_split('/(\s\-\s+)/', $song, 2);
 
echo ' . $matches[1] . ' by ' . $matches[0];
// Should print 'Day Four by Explosions in the Sky'
// Instead $matches[1] outputs nothing, but $matches[0] outputs $song un-split
 
Please could someone explain what I'm doing wrong?

Thank you!
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: preg_split on first occurence of whitespace AND hyphen

Post by prometheuzz »

joeaston wrote:I am trying to achieve the same thing but with matching whitespace+hyphen+whitespace.

I've tried many variations on the following code, but I can't get it to work:

Code: Select all

 
$song = 'Explosions in the Sky – Day Four';
 
$matches = preg_split('/(\s\-\s+)/', $song, 2);
 
echo ' . $matches[1] . ' by ' . $matches[0];
// Should print 'Day Four by Explosions in the Sky'
// Instead $matches[1] outputs nothing, but $matches[0] outputs $song un-split
 
Please could someone explain what I'm doing wrong?

Thank you!

Look closely, your two hyphens are not the same. The one in $song is slightly larger.
Also, you don't need to group your regex (put it inside ( and )'s) and you don't need to escape the yphen inside the regex.

So, this shold work:

Code: Select all

$song = 'Explosions in the Sky - Day Four';
$matches = preg_split('/\s-\s/', $song, 2);
Or,if you want to match either one of those hyphens, and there may be more whitespace characters in front of, or after it, then this will do:

Code: Select all

$matches = preg_split('/\s+(-|–)\s+/', $song, 2);
joeaston
Forum Newbie
Posts: 3
Joined: Wed Mar 28, 2007 2:45 pm

Re: preg_split on first occurence of whitespace

Post by joeaston »

Thanks for trying prometheuzz, but that ain't working! The string still isn't being split.

I don't think it's the hyphen that's the problem. Here's my real code where the hyphen has been copied and pasted:

Code: Select all

$matches = preg_split('/\s–\s/', $item->get_title(), 2);
 
// $item->get_title() returns something like 'Monta – Long Live the Quiet' (no quotes)
Any other suggestions?
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: preg_split on first occurence of whitespace

Post by prometheuzz »

joeaston wrote:Thanks for trying prometheuzz, but that ain't working!
...
Then there's probably more going wrong, but I am sure that the hyphens are different, and can cause problems.
This works perfectly for me:

Code: Select all

#!/usr/bin/php
<?php 
print_r(preg_split('/\s+(-|–)\s+/', 'Explosions in the Sky - Day Four', 2)); // short hyphen
print_r(preg_split('/\s+(-|–)\s+/', 'Explosions in the Sky – Day Four', 2)); // longer hyphen
/* output:
 
Array
(
    [0] => Explosions in the Sky
    [1] => Day Four
)
Array
(
    [0] => Explosions in the Sky
    [1] => Day Four
)
 
*/
?>
joeaston
Forum Newbie
Posts: 3
Joined: Wed Mar 28, 2007 2:45 pm

Re: preg_split on first occurence of whitespace

Post by joeaston »

You were right!

I went on to Wikipedia and compiled a huge list of different hyphen types. That eventually got it working.

I've no idea which one is which though.

Code: Select all

$matches = preg_split('/\s+(-|?|?|–|—|?)\s+/', $item->get_title(), 2);
:lol:

Thanks for your help.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: preg_split on first occurence of whitespace

Post by prometheuzz »

joeaston wrote:You were right!

I went on to Wikipedia and compiled a huge list of different hyphen types. That eventually got it working.

I've no idea which one is which though.

Code: Select all

$matches = preg_split('/\s+(-|?|?|–|—|?)\s+/', $item->get_title(), 2);
:lol:

Thanks for your help.
I won't say "I told you so!"... Oh, dammit, now I did.
; )

You're welcome, of course.
Post Reply