Page 1 of 1

Simple regex - assistance required

Posted: Tue Nov 11, 2003 9:11 am
by Jean-Yves
Hi,

I've parked my brain today and can't figure out a simple regular expression to validate email subjects (1 to 50 chars) and messages (1 to 5000 chars).
The rules are that it should allow any combination of characters as long as it does not start or end with whitespace, and is the given length

I know it's easy, but I'm getting bleary eyed!

Here's my current attempt which is both incorrect and incomplete:

Code: Select all

^\w+ї\w\W\ \.,:@;-]{1,50}ї\w\.]+$
Is it worth excluding the < and > characters to stop backdoor scripting?

Can anyone help out? Thanks.

Posted: Tue Nov 11, 2003 9:26 am
by vigge89
i don't know how to validate email subjects, but for the message and subject lenght:

Code: Select all

<?php

$msgmaxchars = 5000; //Max allowed chars in message

if (strlen($message) < $msgmaxchars) {
    //then mail
} else {
    echo "Message is to long!";
}
?>

Posted: Tue Nov 11, 2003 9:38 am
by Jean-Yves
Thanks for the reply.

Your method is what I'm currently using, but I want to replace a multitude of if() statements with single regular expression checks for each POST variable.

Posted: Tue Nov 11, 2003 9:50 am
by Nay
You want to take out the html and harmful tags?

http://sg.php.net/manual/en/function.strip-tags.php

-Nay

Posted: Tue Nov 11, 2003 9:54 am
by Jean-Yves
You want to take out the html and harmful tags?
More of a theoretical question really. ie - I know that malicious scripts can be submitted via GET and POST. However, what I don't want is someone to email my admin account with a little piece of nasty Javascript embedded in the message or subject that will run when I view the message. I'm wondering whether this is a possibility or whether I'm being unduly paranoid.

Posted: Tue Nov 11, 2003 9:56 am
by Nay
lol, though it is possible to make an external form with the action="" to your script, but if the validation is not client-side then I doubt it that the message will pass the validation by PHP.

-Nay

Posted: Tue Nov 11, 2003 10:02 am
by Jean-Yves
The validation is both client and server side, the latter to cater for the scenario that you mention. Actually, I use htmlspecialchars(), so that should be ok anyway! Forgot I'd done that ;)

Anyways, can anyone help with the regex problem?

Posted: Tue Nov 11, 2003 10:04 am
by twigletmac
If the message is being sent as plain text then it shouldn't be a problem - if you are sending the message as HTML then using [php_man]htmlspecialchars[/php_man]() on the message should help. AFAIK, there's nothing someone can do via the subject, but if you're using Outlook Express that could be a different story.

Of course you could always replace the < and > characters, e.g.

Code: Select all

$message = str_replace(array('<', '>'), array('[', ']'), $message);
or as Nay suggested use strip_tags().

To get rid of excess whitespace on either side of the string, [php_man]trim[/php_man]().

Don't use regular expressions unless you really, really have to...

Mac

Posted: Tue Nov 11, 2003 10:05 am
by twigletmac
Urgh, too slow :roll:.

I'm kinda unclear as to what exactly you are trying to do with the regex.

Mac

Posted: Tue Nov 11, 2003 10:12 am
by Jean-Yves
but if you're using Outlook Express that could be a different story
Please Mac, credit me with some taste! ;) No, I use PMMail2000 so that I can share account folders between my Win and OS/2 systems. Also, it doesn't act as a public gateway to my PC, unlike OE.
Don't use regular expressions unless you really, really have to...
Is that a personal preference or due to a PHP limitation?

Thanks for the reply, BTW.

Posted: Tue Nov 11, 2003 10:22 am
by JAM
There is no limitation per se in regexp's/posix. However, benchmarking everytime shows that string functions are the prefered before those.

I'm not entirely sure either for what you really intend to do here with regexp's. But php.net's manual about ereg(), ereg_replace() and so on has alot of good usercomments that deals with various forms of <tag>-removal/detection.

Posted: Tue Nov 11, 2003 10:26 am
by Jean-Yves
I'm not entirely sure either for what you really intend to do here with regexp's. But php.net's manual about ereg(), ereg_replace() and so on has alot of good usercomments that deals with various forms of <tag>-removal/detection.
I think that my two questions have become merged into one! The tag question was actually supposed to be separate to the regex one.

Since the luminaries have spoken in unison, I shall go back to using string functions rather than regex. I wasn't aware that they are faster, so thanks for that info.
:)

Posted: Tue Nov 11, 2003 10:35 am
by twigletmac
[Edit: I am having a very slow day today :lol: ]

Regular expressions take time, are complicated and are too often used where existing functions would suffice. In many cases you can do things quicker (and more easily) without them. But of course it's a case by case basis.

Didn't think you'd be using OE, just covering myself in case there was a subject line exploit :).

Mac

Posted: Tue Nov 11, 2003 10:45 am
by JAM
Jean-Yves,
Just must note that just because it's slower and takes more resources on the server, I didn't mean that it wasn't something that would fit your needs.

Example; there are differences between tag-detection (but not actually removing/manipulating them) and removing the tags, leaving a clean string to work with. The regular expression functions also have some neat ways to put the results into arrays, that might be of help, just to mention some pro's with ereg/preg.

So, it's very 'from case to case'...
Why not try them both? That way, you at least might learn something more from the differences... =)