Page 1 of 1

Crazy pregin

Posted: Mon Mar 07, 2005 6:48 pm
by Todd_Z
Anyone know a good preg_split pattern for splitting the source of a page by tags, and once you get an array of all the tags/content, if I can figure out if each row is a tag or content... example:

function isTag(); returns true || false

$source = <Source code for page>;

$lines = preg_split ( $source, PLUGIN PATTERN HERE );

$lines[] = "<table>"; -> isTag() = true
$lines[] = "<tr>"; -> true
$lines[] = "<td>"; -> true
$lines[] = "Test td cell"; -> false
$lines[] = "<td>"; -> true
$lines[] = "<tr>"; -> true
$lines[] = "<table>"; -> true

Thanks!

Posted: Mon Mar 07, 2005 7:29 pm
by feyd
note you must use the pattern a second time on all elements of the array to determine if the next n elements will be trash (sub-pattern captures)

Code: Select all

<?php

$test =<<<STOP
<html>
	<head>
		<title>hi</title>
	</head>
	<body>
		Complex test: <a href="#" onclick="if( 10 > 30 ) document.write('<> you smell'); else document.write('<>I smell');">complex test!</a>
	</body>
</html>
STOP;

	$bits = preg_split('#(<\s*/?\s*&#1111;a-z-]+(\s+&#1111;a-z-]+\s*=\s*(&#1111;"''])?.*?\\3)*\s*/?>)#is', $test, -1, PREG_SPLIT_DELIM_CAPTURE);
	var_export($bits);
?>
outputs

Code: Select all

array (
  0 => '',
  1 => '<html>',
  2 => '
        ',
  3 => '<head>',
  4 => '
                ',
  5 => '<title>',
  6 => 'hi',
  7 => '</title>',
  8 => '
        ',
  9 => '</head>',
  10 => '
        ',
  11 => '<body>',
  12 => '
                Complex test: ',
  13 => '<a href="#" onclick="if( 10 > 30 ) document.write(''<> you smell''); else document.write(''<>I smell'');">',
  14 => ' onclick="if( 10 > 30 ) document.write(''<> you smell''); else document.write(''<>I smell'');"',
  15 => '"',
  16 => 'complex test!',
  17 => '</a>',
  18 => '
        ',
  19 => '</body>',
  20 => '
',
  21 => '</html>',
  22 => '',
)

Posted: Tue Mar 08, 2005 3:27 pm
by Todd_Z
Anyone have a preg pattern for figuring out if a string is a valid tag?

<a href="www.google.com"> -> would work
<img src="img.jpg" /> -> would work
</table> -> would work

Posted: Tue Mar 08, 2005 3:47 pm
by feyd
you already have the pattern :P