Form/field processing, logical operators

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
Calimero
Forum Contributor
Posts: 310
Joined: Thu Jan 22, 2004 6:54 pm
Location: Milky Way

Form/field processing, logical operators

Post by Calimero »

Ok,
There is a lot of this.

START HERE // at 00:46

I need PHP to distinguish whether there are any of the operators listed below present in the field.

There are two types
1) stand alone (as words) like AND, OR, NOT
2) and lets call the other group suplemental like + (+word) - (-word) at the begining of the word
3) and the third group is (" ") quotes

Additionally I would need the code to find a certain character not in the beggining of the string, but somwhere inside it (can be anywhere and PHP must find it)


1) for the first group MySQL code I know myself, just PHP recognition of their existence is needed

2) for the second one I think that FOREACH >> IF - ELSE IF loop would work, just again PHP need to find and recognize the first character of the word

3) Well this I'm not sure, but it would need to be of first priority when any of these three groups are checked for. Find start, find end quote, and the content inside declare as a variable (as any other word)


For any other points I missed, suggestions and nose rubbing are welcome :lol:



Thanks Ahead !
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

first and second are easily captured by the regexps ( /\b(AND|OR|NOT)\b/i , /(\+|-)\w/ ).
third needs a little extra work:

Code: Select all

$text = ' text with some "quoted string" inside. "More quoted words here".';
$strings = array();
for($i = 0, $length = strlen($text), $quoted = false, $current = ''; $i < $length; $i++) {
  if( $text{$i} == '"' ) {
     $quoted = !$quoted; 
     if(!$quoted) {
         array_push($strings, $current);
         $current = '';
     } else continue;
  }
  if($quoted) $current .= $text{$i};
}
quoted strings are accumulated in the $strings array
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

Weirdan wrote: third needs a little extra work:
??

Code: Select all

$text = ' text with some "quoted string" inside. "More quoted words here".';
if (preg_match_all('/"(.*?)"/', $text, $matches, PREG_PATTERN_ORDER))
{
  foreach ($matches[1] as $quoted)
  {
    echo $quoted . "\x0a";
  }
}
Outputs...

Code: Select all

quoted string
More quoted words here
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

what if you need to add the ability to escape the quotes from being interpreted? and ability to escape the escape? ;)
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

Depends on how much you want to filter but....

Code: Select all

$text = ' text with some "quoted string" inside.  "these quoted words will be ignored" but here is "More quoted words here".';
if (preg_match_all('/(?<!\\\\)"(.*?)(?<!\\\\)"/', $text, $matches, PREG_PATTERN_ORDER))
{
  foreach ($matches[1] as $quoted)
  {
    echo $quoted . "\x0a";
  }
}
...outputs..

Code: Select all

quoted string
More quoted words here
So that should give you a start, how would you deal the problem? as for the given string above your code would populate $strings with...

Code: Select all

Array
(
    &#1111;0] =&gt; quoted string
    &#1111;1] =&gt; these quoted words will be ignored\
    &#1111;2] =&gt; More quoted words here
)
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

redmonkey wrote:So that should give you a start, how would you deal the problem?
with a few strokes:

Code: Select all

$text = ' text with some "quoted string" inside. this "quotes will pass undetected". "More quoted words here". Moreover, here we have the "escaped escape"';
$strings = array();

for($i = 0, $length = strlen($text), $quoted = false, $escaped = false, $current = ''; $i < $length; $i++) {
  if( $text{$i} == '\'' ) $escaped = !$escaped; // <=== added this line
  if( $text{$i} == '"' && !$escaped ) { // <=== modified this line
     $quoted = !$quoted;
     if(!$quoted) {
         array_push($strings, $current);
         $current = '';
     } else continue;
  }
  if($quoted) $current .= $text{$i};
  if( $text{$i} != '\'' ) $escaped = false; // <=== added this line
}
var_dump($strings);
;)
I doubt the 'escaped escape' could be properly parsed with a regexp...
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

Weirdan wrote: I doubt the 'escaped escape' could be properly parsed with a regexp...
I think a single pure regex solution to achieve that would be just too much of a mind bend for what we are trying to achieve here. However, with some simple pre and post processing it can still be done quite quickly. :)

Code: Select all

$text = ' text with some "quoted string" inside. this "quotes will pass undetected". "More quoted words here". Moreover, here we have the "escaped escape"';
$strings = array();
if (preg_match_all('/(?<!\\\\)"(.*?)(?<!\\\\)"/', str_replace('\\\'', '_SOME UNIQUE STRING_', $text), $matches, PREG_PATTERN_ORDER))
{
  foreach ($matches[1] as $quoted)
  {
    $strings[] = str_replace('_SOME UNIQUE STRING_', '\\\'', $quoted);
  }
}
var_dump($strings);
McGruff
DevNet Master
Posts: 2893
Joined: Thu Jan 30, 2003 8:26 pm
Location: Glasgow, Scotland

Post by McGruff »

An interesting alternative to regex is to use a string iterator (I'm assuming this is a search script and therefore the string won't be very long).

The iteration would be observed by various rules such as "if current char = quote, and $compiling == false, start compiling new word and set $compiling = true;". You would stop compiling when you hit another quote char, ignoring any word boundaries within a quoted string. The compiling flag keeps track of where you are. That's just a rough outline of the kind of flags and rules you might need.

This way can also get a bit tortuous.
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

I think for short strings (typically the case with seach criteria) there would be minimal difference (both in performance and logical difficulty) between either approach. I think it would just come down to which method you prefer.

Interestingly enough, due to my morbid curiosity, I did benchmark the two differing code samples offered thus far. And even on this relatively short string (although longer than you average search criteria) the regex method performed noteably faster. The time taken to execute the regex method was (on average) 0.002 seconds, while the other method was averging 0.05 seconds.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

redmonkey wrote: Interestingly enough, due to my morbid curiosity, I did benchmark the two differing code samples offered thus far. And even on this relatively short string (although longer than you average search criteria) the regex method performed noteably faster. The time taken to execute the regex method was (on average) 0.002 seconds, while the other method was averging 0.05 seconds.
ok, you convinced me ;) Nevertheless it was interesting discussion, thanks =)
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

It's always interesting to see two more different approaches to the same problem, even better when it doesn't descend into a mad rant about which is better.

As I said previously, for short strings there probably isn't too much of a difference in performance to adopt either approach to fully parse the search criteria.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

'25 times faster' isn't 'too much of a difference' to you? 8O
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

Weirdan wrote:'25 times faster' isn't 'too much of a difference' to you? 8O
I wasn't specifically talking about these two code examples, I was suggesting that the complete solution would probably perform roughly the same. i.e.

The regex method would probably be best using several regex statements to obtain the quoted strings, 'and' strings and 'or' strings. I would think that on a _short string_ there would be minimal difference between iterating along the string and applying 3-4 regex to it.

In the samples given thus far it suggests that regex would be the way to go, but for shorter strings there may be little difference.

Of course, I haven't actually benchmarked complete working solutions, so I could be completely wrong. I suppose it would depend on how complex your iterator becomes over how complex the regex (and how many) would be required.
Post Reply