Page 1 of 1
Crazy pregin
Posted: Mon Mar 07, 2005 6:48 pm
by Todd_Z
Anyone know a good preg_split pattern for splitting the source of a page by tags, and once you get an array of all the tags/content, if I can figure out if each row is a tag or content... example:
function isTag(); returns true || false
$source = <Source code for page>;
$lines = preg_split ( $source, PLUGIN PATTERN HERE );
$lines[] = "<table>"; -> isTag() = true
$lines[] = "<tr>"; -> true
$lines[] = "<td>"; -> true
$lines[] = "Test td cell"; -> false
$lines[] = "<td>"; -> true
$lines[] = "<tr>"; -> true
$lines[] = "<table>"; -> true
Thanks!
Posted: Mon Mar 07, 2005 7:29 pm
by feyd
note you must use the pattern a second time on all elements of the array to determine if the next n elements will be trash (sub-pattern captures)
Code: Select all
<?php
$test =<<<STOP
<html>
<head>
<title>hi</title>
</head>
<body>
Complex test: <a href="#" onclick="if( 10 > 30 ) document.write('<> you smell'); else document.write('<>I smell');">complex test!</a>
</body>
</html>
STOP;
$bits = preg_split('#(<\s*/?\s*їa-z-]+(\s+їa-z-]+\s*=\s*(ї"''])?.*?\\3)*\s*/?>)#is', $test, -1, PREG_SPLIT_DELIM_CAPTURE);
var_export($bits);
?>
outputs
Code: Select all
array (
0 => '',
1 => '<html>',
2 => '
',
3 => '<head>',
4 => '
',
5 => '<title>',
6 => 'hi',
7 => '</title>',
8 => '
',
9 => '</head>',
10 => '
',
11 => '<body>',
12 => '
Complex test: ',
13 => '<a href="#" onclick="if( 10 > 30 ) document.write(''<> you smell''); else document.write(''<>I smell'');">',
14 => ' onclick="if( 10 > 30 ) document.write(''<> you smell''); else document.write(''<>I smell'');"',
15 => '"',
16 => 'complex test!',
17 => '</a>',
18 => '
',
19 => '</body>',
20 => '
',
21 => '</html>',
22 => '',
)
Posted: Tue Mar 08, 2005 3:27 pm
by Todd_Z
Anyone have a preg pattern for figuring out if a string is a valid tag?
<a href="
www.google.com"> -> would work
<img src="img.jpg" /> -> would work
</table> -> would work
Posted: Tue Mar 08, 2005 3:47 pm
by feyd
you already have the pattern
