Need regex for globbing function

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Need regex for globbing function

Post by alex.barylski »

So i've written a recursive globbing funciton...

Code: Select all

'/.+/'
The above is my regex to match EVERY file or folder on the system, recursively searching directories, etc...

Heres the problem...

If I wanted to narrow that search down, to say PHP and GIF files, how would I do that?

Here is what I have right now:

Code: Select all

'/.+\.(php|gif)?/'
If your a regex hack you see the problem...

If finds files like:
- test.php.dat

Even though .dat is NOT what i'm looking for :)

Also, in order for the recursion to work, it needs to match folders as well, which obviously don't likely have PHP extensions...

So I need a regex which will match any directory name or filename but also limit result to only certina file types...aka extensions!!!

Any ideas???

Cheers :)
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Code: Select all

/^.+?(?:\.(?:php|gif))?$/
:?:
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

I reckon 20% of our posts in this forum relate to pattern greediness....
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

I tried that regex Feyd and it didn't work :(

The globbing function I use only needs to pattern match against the name of a directory or file, not a full path...

???

I have very little experience of regex, so I can't even begin to think what might be wrong with it...

The problem is, no matter what extensions I add or remmove...it seems EVERY file/folder is getting pulled???

Cheers :)
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Where are you using it? More code maybe?

Try removing the last question mark in my regex.
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

feyd wrote:Where are you using it? More code maybe?

Try removing the last question mark in my regex.

Code: Select all

function _file_glob($path, $re_pattern)
{
  $arr_names = array(); // Array of file/folder paths which meet glob criteria
  
  if(is_dir($path)){ 
    if($dh = opendir($path)){ 

      clearstatcache();
      while(($name = readdir($dh)) !== false){
        if($name != '.' && $name != '..'){ 
        
          //
          // Check file/folder name against a regex pattern
          //echo $name.'<br>';
         if(preg_match($re_pattern, $name)){
           // ...
         }
I am aware of glob() but my function goes way above and beyond the capabilities of glob() thus the custom function.

I'm positive the problem lies with regex...

What I have concluded is that I need a function which:
1) Matches a directory name or file name (minus extension)
2) And optionally matches an extension(s) list in brackets (php|gif) by starting at the END of the string and counting backwards (if possible?)

This way ANY file or folder is matched ALWAYS and optionally matching extensions...

S*ite...I just realized maybe the problem lies within my function....because I don't distinguish between file/folders in my code before matching...a path is a path is a path... :P you know what I'm saying?

So using regex that i've describe above....it would return folders named 'myimages.gif' as well as files with GIF extensions... :(

Ok, so a re-write is in order :)

Having been exposed to my snippet of code above, can you think of a effective way of solving this problem?

I could check $path.'/'.$name for it's type I suppose and use a different preg_match pattern (for file or folder) but then wouldn't that require me passing in two different patterns?

I want to avoid calling the function twice or passing seperate patterns if possible... :?

Cheers :)
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

Ok, so I've narrowed down my problem even more...

What would work, if this is possible in regex...

Is a regex, which wasn't greedy BUT matched ANY valid name for a file or folder

However, optionally matched against a list of extensions, but only if the greedy modifier was included inside the regex...

The optional matching should work by starting at END of string and looking for extension by working backwards until extension and period located...

Code: Select all

/.+ (.(bmp|png|jpg|jpeg))/g
Note the g modifier

I would add that modifier to the regex dynamically before execution...thus the regex would then match only FILE names...and not folders, but if I remove the g modifier, it will match both file and folder names because NO file extension check is done...

Any ideas??? :)
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

There's no "g" modifier in PHP. You'll get a warning if you use it. To match end of string this works:

Code: Select all

#\.(?:png|bmp|gif|jpe?g)$#is
Post Reply