Page 1 of 1
Form/field processing, logical operators
Posted: Sat Jun 26, 2004 5:58 pm
by Calimero
Ok,
There is a lot of this.
START HERE // at 00:46
I need PHP to distinguish whether there are any of the operators listed below present in the field.
There are two types
1) stand alone (as words) like AND, OR, NOT
2) and lets call the other group suplemental like + (+word) - (-word) at the begining of the word
3) and the third group is (" ") quotes
Additionally I would need the code to find a certain character not in the beggining of the string, but somwhere inside it (can be anywhere and PHP must find it)
1) for the first group MySQL code I know myself, just PHP recognition of their existence is needed
2) for the second one I think that FOREACH >> IF - ELSE IF loop would work, just again PHP need to find and recognize the first character of the word
3) Well this I'm not sure, but it would need to be of first priority when any of these three groups are checked for. Find start, find end quote, and the content inside declare as a variable (as any other word)
For any other points I missed, suggestions and nose rubbing are welcome
Thanks Ahead !
Posted: Sat Jun 26, 2004 8:07 pm
by Weirdan
first and second are easily captured by the regexps ( /\b(AND|OR|NOT)\b/i , /(\+|-)\w/ ).
third needs a little extra work:
Code: Select all
$text = ' text with some "quoted string" inside. "More quoted words here".';
$strings = array();
for($i = 0, $length = strlen($text), $quoted = false, $current = ''; $i < $length; $i++) {
if( $text{$i} == '"' ) {
$quoted = !$quoted;
if(!$quoted) {
array_push($strings, $current);
$current = '';
} else continue;
}
if($quoted) $current .= $text{$i};
}
quoted strings are accumulated in the $strings array
Posted: Sat Jun 26, 2004 11:03 pm
by redmonkey
Weirdan wrote:
third needs a little extra work:
??
Code: Select all
$text = ' text with some "quoted string" inside. "More quoted words here".';
if (preg_match_all('/"(.*?)"/', $text, $matches, PREG_PATTERN_ORDER))
{
foreach ($matches[1] as $quoted)
{
echo $quoted . "\x0a";
}
}
Outputs...
Code: Select all
quoted string
More quoted words here
Posted: Sat Jun 26, 2004 11:08 pm
by Weirdan
what if you need to add the ability to escape the quotes from being interpreted? and ability to escape the escape?

Posted: Sat Jun 26, 2004 11:34 pm
by redmonkey
Depends on how much you want to filter but....
Code: Select all
$text = ' text with some "quoted string" inside. "these quoted words will be ignored" but here is "More quoted words here".';
if (preg_match_all('/(?<!\\\\)"(.*?)(?<!\\\\)"/', $text, $matches, PREG_PATTERN_ORDER))
{
foreach ($matches[1] as $quoted)
{
echo $quoted . "\x0a";
}
}
...outputs..
Code: Select all
quoted string
More quoted words here
So that should give you a start, how would you deal the problem? as for the given string above your code would populate $strings with...
Code: Select all
Array
(
ї0] => quoted string
ї1] => these quoted words will be ignored\
ї2] => More quoted words here
)
Posted: Sun Jun 27, 2004 12:32 am
by Weirdan
redmonkey wrote:So that should give you a start, how would you deal the problem?
with a few strokes:
Code: Select all
$text = ' text with some "quoted string" inside. this "quotes will pass undetected". "More quoted words here". Moreover, here we have the "escaped escape"';
$strings = array();
for($i = 0, $length = strlen($text), $quoted = false, $escaped = false, $current = ''; $i < $length; $i++) {
if( $text{$i} == '\'' ) $escaped = !$escaped; // <=== added this line
if( $text{$i} == '"' && !$escaped ) { // <=== modified this line
$quoted = !$quoted;
if(!$quoted) {
array_push($strings, $current);
$current = '';
} else continue;
}
if($quoted) $current .= $text{$i};
if( $text{$i} != '\'' ) $escaped = false; // <=== added this line
}
var_dump($strings);

I doubt the 'escaped escape' could be properly parsed with a regexp...
Posted: Sun Jun 27, 2004 10:04 am
by redmonkey
Weirdan wrote:
I doubt the 'escaped escape' could be properly parsed with a regexp...
I think a single pure regex solution to achieve that would be just too much of a mind bend for what we are trying to achieve here. However, with some simple pre and post processing it can still be done quite quickly.
Code: Select all
$text = ' text with some "quoted string" inside. this "quotes will pass undetected". "More quoted words here". Moreover, here we have the "escaped escape"';
$strings = array();
if (preg_match_all('/(?<!\\\\)"(.*?)(?<!\\\\)"/', str_replace('\\\'', '_SOME UNIQUE STRING_', $text), $matches, PREG_PATTERN_ORDER))
{
foreach ($matches[1] as $quoted)
{
$strings[] = str_replace('_SOME UNIQUE STRING_', '\\\'', $quoted);
}
}
var_dump($strings);
Posted: Sun Jun 27, 2004 2:13 pm
by McGruff
An interesting alternative to regex is to use a string iterator (I'm assuming this is a search script and therefore the string won't be very long).
The iteration would be observed by various rules such as "if current char = quote, and $compiling == false, start compiling new word and set $compiling = true;". You would stop compiling when you hit another quote char, ignoring any word boundaries within a quoted string. The compiling flag keeps track of where you are. That's just a rough outline of the kind of flags and rules you might need.
This way can also get a bit tortuous.
Posted: Sun Jun 27, 2004 7:23 pm
by redmonkey
I think for short strings (typically the case with seach criteria) there would be minimal difference (both in performance and logical difficulty) between either approach. I think it would just come down to which method you prefer.
Interestingly enough, due to my morbid curiosity, I did benchmark the two differing code samples offered thus far. And even on this relatively short string (although longer than you average search criteria) the regex method performed noteably faster. The time taken to execute the regex method was (on average) 0.002 seconds, while the other method was averging 0.05 seconds.
Posted: Mon Jun 28, 2004 1:00 am
by Weirdan
redmonkey wrote:
Interestingly enough, due to my morbid curiosity, I did benchmark the two differing code samples offered thus far. And even on this relatively short string (although longer than you average search criteria) the regex method performed noteably faster. The time taken to execute the regex method was (on average) 0.002 seconds, while the other method was averging 0.05 seconds.
ok, you convinced me

Nevertheless it was interesting discussion, thanks =)
Posted: Mon Jun 28, 2004 5:54 am
by redmonkey
It's always interesting to see two more different approaches to the same problem, even better when it doesn't descend into a mad rant about which is better.
As I said previously, for short strings there probably isn't too much of a difference in performance to adopt either approach to fully parse the search criteria.
Posted: Mon Jun 28, 2004 7:25 am
by Weirdan
'25 times faster' isn't 'too much of a difference' to you?

Posted: Mon Jun 28, 2004 7:36 am
by redmonkey
Weirdan wrote:'25 times faster' isn't 'too much of a difference' to you?

I wasn't specifically talking about these two code examples, I was suggesting that the complete solution would probably perform roughly the same. i.e.
The regex method would probably be best using several regex statements to obtain the quoted strings, 'and' strings and 'or' strings. I would think that on a _short string_ there would be minimal difference between iterating along the string and applying 3-4 regex to it.
In the samples given thus far it suggests that regex would be the way to go, but for shorter strings there may be little difference.
Of course, I haven't actually benchmarked complete working solutions, so I could be completely wrong. I suppose it would depend on how complex your iterator becomes over how complex the regex (and how many) would be required.