[Tricky] Regex for adding soft-breaks to QP encoded strings
Posted: Sun May 06, 2007 6:53 am
Overview
QP encoded strings represent some bytes in the string as =XX where XX is the ordinal value of the byte in hex. For example, \n (line feed) would be =0A. The maximum length of a line is 76 characters. Encoded lines MUST end with \r\n (CRLF). A line ending with =\r\n is a wrapped line, where the =\r\n is know as a "soft-break". When it is decoded the line break and the "=" get stripped, therefore:
Is the same as
The problem
Ok, now you know as much as you need to know about QP encoding to tackle this problem
I receive my encoded string as one long line without the required soft-breaks to keep the lines under 76 characters. I need to be able to work my way along this string, deciding where to add the soft-breaks.
Only rules for this bit
1. The soft break MUST happen before 76 characters are present on the line.
2. The soft break cannot be placed between two encoded bytes directly.
3. Ideally it should be greedy and get as many of those 76 chars on the line as it can, without breaking rules 1 & 2.
So from this string:
Which decodes to (UTF-8):
These would be valid:
But these would not:
I only need a pattern which gives me the first 1-76 characters in the string which satisfy those rules because I keep trimming the string myself and re-running the pattern until no string is left.
Here's a pattern which works in PHP5.2:
But in PHP4 I get this error:

Don't worry about strings where you only hav =XX=YY=AA=BB=CC hundreds of times in sequence because rule 2 can never be satisfied. I'll worry about that myself, I just need to pick your brains on this pattern first
Cheers.
QP encoded strings represent some bytes in the string as =XX where XX is the ordinal value of the byte in hex. For example, \n (line feed) would be =0A. The maximum length of a line is 76 characters. Encoded lines MUST end with \r\n (CRLF). A line ending with =\r\n is a wrapped line, where the =\r\n is know as a "soft-break". When it is decoded the line break and the "=" get stripped, therefore:
Code: Select all
Hello =
world!Code: Select all
Hello world!Ok, now you know as much as you need to know about QP encoding to tackle this problem
Only rules for this bit
1. The soft break MUST happen before 76 characters are present on the line.
2. The soft break cannot be placed between two encoded bytes directly.
3. Ideally it should be greedy and get as many of those 76 chars on the line as it can, without breaking rules 1 & 2.
So from this string:
Code: Select all
Varov=C3=A1n=C3=AD_p=C5=99ed_expirac=C3=AD_dom=C3=A9ny_logomixCode: Select all
Varování_před_expirací_domény_logomixCode: Select all
Varov=C3=A1n=C3=AD=
_p=C5=99ed_expirac=C3=AD_dom=C3=A9ny_logomix
Varov=C3=A1n=C3=AD_p=C5=99=
ed_expirac=C3=AD_dom=C3=A9ny_logomix
Varov=C3=A1n=C3=AD_p=C5=99ed_expir=
ac=C3=AD_dom=C3=A9ny_logomix
Varov=C3=A1n=C3=AD_p=C5=99ed_expirac=C3=AD_dom=C3=A9
ny_logomix
Varov=C3=A1n=C3=AD_p=C5=99ed_expirac=C3=AD_d=
om=C3=A9ny_logomixCode: Select all
Varov=C3=
=A1n=C3=AD_p=C5=99ed_expirac=C3=AD_dom=C3=A9ny_logomix
Varov=C3=A1n=C3=AD_p=C5=
=99ed_expirac=C3=AD_dom=C3=A9ny_logomix
Varov=C3=A1n=C3=AD_p=C5=99ed_expirac=C3=AD_dom=C3=
=A9ny_logomix
Varov=C3=A1n=C3=AD_p=C5=99ed_expirac=C3=AD_dom=C3=A=
9ny_logomix
Varov=C3=A1n=C3=AD_p=C5=99ed_expirac=C=
3=AD_dom=C3=A9ny_logomixHere's a pattern which works in PHP5.2:
Code: Select all
preg_match('/^.{1,' . $length . '}(?<=[^=])[^=](?!=[A-F0-9]{2})/', $string, $matches);The error is stupid because it is fixed-width, it's just not fixed value, but anyway, I need some regex gurus to throw in some more patterns because I've spent too long on this nowUnexpected PHP error [Compilation failed: lookbehind assertion is not fixed length at offset 16] severity [E_WARNING] in [/Users/d11wtq/public_html/swiftmailer/trunk/php4/lib/Swift/Message/Encoder.php line 195]
Don't worry about strings where you only hav =XX=YY=AA=BB=CC hundreds of times in sequence because rule 2 can never be satisfied. I'll worry about that myself, I just need to pick your brains on this pattern first
Cheers.