Crazy pregin

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
Todd_Z
Forum Regular
Posts: 708
Joined: Thu Nov 25, 2004 9:53 pm
Location: U Michigan

Crazy pregin

Post by Todd_Z »

Anyone know a good preg_split pattern for splitting the source of a page by tags, and once you get an array of all the tags/content, if I can figure out if each row is a tag or content... example:

function isTag(); returns true || false

$source = <Source code for page>;

$lines = preg_split ( $source, PLUGIN PATTERN HERE );

$lines[] = "<table>"; -> isTag() = true
$lines[] = "<tr>"; -> true
$lines[] = "<td>"; -> true
$lines[] = "Test td cell"; -> false
$lines[] = "<td>"; -> true
$lines[] = "<tr>"; -> true
$lines[] = "<table>"; -> true

Thanks!
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

note you must use the pattern a second time on all elements of the array to determine if the next n elements will be trash (sub-pattern captures)

Code: Select all

<?php

$test =<<<STOP
<html>
	<head>
		<title>hi</title>
	</head>
	<body>
		Complex test: <a href="#" onclick="if( 10 > 30 ) document.write('<> you smell'); else document.write('<>I smell');">complex test!</a>
	</body>
</html>
STOP;

	$bits = preg_split('#(<\s*/?\s*&#1111;a-z-]+(\s+&#1111;a-z-]+\s*=\s*(&#1111;"''])?.*?\\3)*\s*/?>)#is', $test, -1, PREG_SPLIT_DELIM_CAPTURE);
	var_export($bits);
?>
outputs

Code: Select all

array (
  0 => '',
  1 => '<html>',
  2 => '
        ',
  3 => '<head>',
  4 => '
                ',
  5 => '<title>',
  6 => 'hi',
  7 => '</title>',
  8 => '
        ',
  9 => '</head>',
  10 => '
        ',
  11 => '<body>',
  12 => '
                Complex test: ',
  13 => '<a href="#" onclick="if( 10 > 30 ) document.write(''<> you smell''); else document.write(''<>I smell'');">',
  14 => ' onclick="if( 10 > 30 ) document.write(''<> you smell''); else document.write(''<>I smell'');"',
  15 => '"',
  16 => 'complex test!',
  17 => '</a>',
  18 => '
        ',
  19 => '</body>',
  20 => '
',
  21 => '</html>',
  22 => '',
)
User avatar
Todd_Z
Forum Regular
Posts: 708
Joined: Thu Nov 25, 2004 9:53 pm
Location: U Michigan

Post by Todd_Z »

Anyone have a preg pattern for figuring out if a string is a valid tag?

<a href="www.google.com"> -> would work
<img src="img.jpg" /> -> would work
</table> -> would work
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

you already have the pattern :P
Post Reply