Parsing a template, need help on design

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
fastfingertips
Forum Contributor
Posts: 242
Joined: Sun Dec 28, 2003 1:40 am
Contact:

Parsing a template, need help on design

Post by fastfingertips »

Hello i'm trying to create a template engine with some basic commands and to use it in my current application. But now i'm thinking how to start to parse it. My template file may look:

Code: Select all

<html>
	<head>
		<title><<:title:>></title>
	</head>
	<body>
		<<:foreach (:user: as $key=>$value):>>
		<tr>
			<td><<:user:username:>></td>
			<td><<:user:password:>></td>
			<td><<:user:email:>></td>
		</tr>
		<<:endfor:>>
	</body>
</html>
As you may notice i have simple values but also i have cycle commands (like foreach). I do not know how to start parsing the file, from values and replace them with the template or from template tags and when i will encounter a special tag to look in the values to see if i have a match.

I'm also thinking to create a command factory (to be able to implement more availabe commands like: for, if, while etc).
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Although regex could be used, I'd recommend a string parser that does transformation into code that can be run through eval() (be very careful here)
fastfingertips
Forum Contributor
Posts: 242
Joined: Sun Dec 28, 2003 1:40 am
Contact:

Post by fastfingertips »

Since my commands will be limited i will use regex (as you helped me already at that parsing problem) and i will make eval at the end to the builded string.

At the moment i'm thinking for example to provide to the parser an array that looks like:

Code: Select all

$arrDetails = array(
	'title'=>'Welcome ',
	'user'=> array( 0 => array('username'=>'Puiu','password'=>'0007','email'=>'puiu@localhost.com'),
				1 => array('username'=>'Mind','password'=>'0002','email'=>'mind@localhost.com'),
				1 => array('username'=>'Lenuta','password'=>'0001','email'=>'lenuta@localhost.com'))
	);
How do you advice me to translate a command (the foreach command from example).

PS. I'm asking many things because i'm building a large application and i do not have full time to make an advanced analysis and also to make tests, i cannot afford to spend to much time to test new ways and i think your experience will help me a lot.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

I've never understood why people create templates that have such a "programming" feel to them. You've got logic in your template which sort of defies the point IMO, on top of that, you could have simply used PHP (a template language itself) to do that loop and thus there's just more to have to learn and remember this way.

When I create templates I do define those blocks that you've indictaed need to be looped over but I deal with all the logic in a controller ;)
fastfingertips
Forum Contributor
Posts: 242
Joined: Sun Dec 28, 2003 1:40 am
Contact:

Post by fastfingertips »

I'm using as you may notice the PHP ability to write this cycling instructions like:

Code: Select all

foreach($arrData  as $key=>$value) :
do something
endfor;
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

If you don't have time to create a parser, why not use Smarty?
fastfingertips
Forum Contributor
Posts: 242
Joined: Sun Dec 28, 2003 1:40 am
Contact:

Post by fastfingertips »

Simply because for example if the designer should change the row style he must be able to do it in the current page and not by opening another template file. That's why this basic commands are needed.
fastfingertips
Forum Contributor
Posts: 242
Joined: Sun Dec 28, 2003 1:40 am
Contact:

Post by fastfingertips »

Smarty will come with to many features and i do not need all of them and because of what i have now i cannot add Smarty in my project (sometimes we are limit to what we have :( )
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Well if it's any help here's the start of a generic parser I'm working on. It really needs so speed increases but it's a base. It wold be a lot faster if it wasn't so generic. You use configuration files (that array) to define your token definitions. I'll be extending it to make a generic lexical analyzer too.

Code: Select all

class lexer
{
	private $source;
	private $tokenDefinitions = array();
	private $tokenTypes = array();
	private $tokenLength = 0;
	private $tokens = array();
	private $inertTokens = array();

	function __construct($source)
	{
		$this->source = $source;
	}

	public function addDefinitions($arr)
	{
		$this->tokenDefinitions = array_values($arr);
		$this->tokenTypes = array_keys($arr);
		foreach ($this->tokenTypes as $k => $t) $this->defineOnce($t, $k);
		$this->tokenLength = count($this->tokenDefinitions);
	}

	public function getTokenName($val)
	{
		if (isset($this->tokenTypes[$val])) return $this->tokenTypes[$val];
	}

	private function defineOnce($d, $val)
	{
		if (!defined($d)) define($d, $val);
	}

	public function tokenize($str=false, $pos=0)
	{
		if ($str === false) $str = $this->source; //Start
		if (empty($str)) return false; //End of string
		
		$i = 0;
		
		foreach($this->tokenDefinitions as $type => $def)
		{
			$i++;
			
			if ($def[1] > 0)
			{
				preg_match($def[0], $str, $matches, PREG_OFFSET_CAPTURE);
				$tok = $matches[0];
			}
			else
			{
				$strpos = strpos($str, $def[0]);
				if ($strpos !== false) $tok = array(
					substr($str, $strpos, strlen($def[0])),
					$strpos
				);
				else $tok = array();
			}
			
			//No tokens found in string (or at least not at the start)
			if ($i == $this->tokenLength && (!isset($tok[1]) || $tok[1] != 0))
			{
				$last_tok = $this->getLastToken();
				
				die('<strong>Fatal:</strong> Undefined token at offset '.$pos.' in source (Near <em>\' '.$last_tok.' \'</em>)<br />');
			}
			elseif(isset($tok[1]) && $tok[1] == 0) //Token found at start
			{
				$len = strlen($tok[0]);
				$substr = substr($str, $len);
				$this->tokens[] = array('token' => $tok[0], 'type' => $type, 'offset' => $pos);
				if (strlen($substr) > 0)
				{
					//Move along the string and go all over again
					$this->tokenize($substr, $pos+$len);
				}
				break;
			}
		}
	}

	public function setInertTokens($arr)
	{
		$this->inertTokens = $arr;
	}

	private function getLastToken()
	{
		$tmp = array();
		foreach ($this->tokens as $arr)
		{
			if (!in_array($arr['type'], $this->inertTokens))
			{
				$tmp[] = $arr;
			}
		}
		$tmp2 = array_pop($tmp);
		return $tmp2['token'];
	}

	public function getTokens()
	{
		return $this->tokens;
	}
	
	public function dump()
	{
		echo '<pre>'.print_r($this, 1).'</pre>';
	}
}

Code: Select all

//Types should be listed in order of precedence
// For example, look for strings before variables since a variable inside a string is not valid
$tokenTypes = array(
	
	'TK_ESCAPE_CHARACTER'		=> array('\\\\', 0),	
	'TK_DOUBLE_STRING'		=> array('@(?<!\\\\)".*?(?<!\\\\)"@s', 1),
	'TK_LITERAL_STRING'		=> array("@(?<!\\\\)'.*?(?<!\\\\)'@s", 1),
	'TK_COMMENT'			=> array('@(?<!\\\\)/\\*(.*?)\\*/|//.*?$|#.*?$@sm', 1),
	'TK_VARIABLE'			=> array('@\\$[a-z_]\w*@i', 1),
	'TK_CLASS'			=> array('@\bclass\b@i', 1),
	'TK_FUNCTION'			=> array('@\b(?:c)?function\b@i', 1),
	'TK_INTERFACE'			=> array('@\binterface\b@i', 1),
	'TK_ECHO'			=> array('@\becho\b@i', 1),
	'TK_PRINT'			=> array('@\bprint\b@i', 1),
	'TK_EXIT'			=> array('@\bexit\b@i', 1),
	'TK_DIE'				=> array('@\bdie\b@i', 1),
	'TK_OPEN_TAG_WITH_ECHO'		=> array('@<\?=@i', 1),
	'TK_OPEN_TAG'			=> array('@<\?(?:php)?@i', 1),
	'TK_CLOSE_TAG'			=> array('?>', 0),
	'TK_ARRAY_CAST'			=> array('@\([ \t]*array[ \t]*\)@i', 1),
	'TK_DOUBLE_CAST'	=>	array('@\(\s*(?:double|float|real)\s*\)@i', 1),
	'TK_AND_EQUAL'			=> array('&=', 0),
	'TK_OBJECT_OPERATOR'		=> array('->', 0),
	'TK_DOUBLE_ARROW'		=> array('=>', 0),
	'TK_APPEND_OPERATOR'		=> array('.=', 0),
	'TK_NOT_EQUAL'			=> array('!=', 0),
	'TK_NOT_IDENTICAL'		=> array('!==', 0),
	'TK_BOOLEAN_AND'			=> array('&&', 0),
	'TK_BOOLEAN_OR'			=> array('||', 0),
	'TK_INC'				=> array('++', 0),
	'TK_DEC'				=> array('--', 0),
	'TK_IS_IDENTICAL'		=> array('===', 0),
	'TK_IS_EQUAL'			=> array('==', 0),
	'TK_LESS_THAN_OR_EQUAL'		=> array('<=', 0),
	'TK_GREATER_THAN_OR_EQUAL'	=> array('>=', 0),
	'TK_BITWISE_LEFT_SHIFT'		=> array('<<', 0),
	'TK_BITWISE_RIGHT_SHIFT'		=> array('>>', 0),
	'TK_EQUALS'			=> array('=', 0),
	'TK_RIGHT_PAREN'			=> array(')', 0),
	'TK_LEFT_PAREN'			=> array('(', 0),
	'TK_COMMA'			=> array(',', 0),
	'TK_CONCAT_OPERATOR'		=> array('.', 0),
	'TK_GREATER_THAN'		=> array('>', 0),
	'TK_LESS_THAN'			=> array('<', 0),
	'TK_REFERENCE_OPERATOR'		=> array('&', 0),
	'TK_LEFT_BRACKET'		=> array('[', 0),
	'TK_RIGHT_BRACKET'		=> array(']', 0),
	'TK_COLON'			=> array(':', 0),
	'TK_SEMICOLON'			=> array(';', 0),
	'TK_NEGATION_OPERATOR'		=> array('!', 0),
	'TK_RIGHT_BRACE'			=> array('}', 0),
	'TK_LEFT_BRACE'			=> array('{', 0),
	'TK_PLUS'			=> array('+', 0),
	'TK_MINUS'			=> array('-', 0),
	'TK_HEX_NUMERAL'			=> array('@0x[a-f0-9]+@i', 1),
	'TK_DECIMAL_OR_FLOAT'		=> array('@\d+\.\d+@', 1),
	'TK_OCT_NUMERAL'			=> array('@0\d+@', 1),
	'TK_INTEGER_NUMERAL'		=> array('@\d+@', 1),
	'TK_IF'				=> array('@\bif\b@i', 1),
	'TK_ELSE'			=> array('@\belse\b@i', 1),
	'TK_ELSEIF'			=> array('@\belseif\b@i', 1),
	'TK_ARRAY'			=> array('@\barray\b@', 1),
	'TK_AS'				=> array('@\bas\b@i', 1),
	'TK_PUBLIC'			=> array('@\bpublic\b@i', 1),
	'TK_PRIVATE'			=> array('@\bprivate\b@i', 1),
	'TK_PROTECTED'			=> array('@\bprotected\b@i', 1),
	'TK_VAR'				=> array('@\bvar\b@i', 1),
	'TK_STATIC'			=> array('@\bstatic\b@', 1),
	'TK_EXTENDS'			=> array('@\bextends\b@i', 1),
	'TK_IMPLEMENTS'			=> array('@\bimplements\b@i', 1),
	'TK_CASE'			=> array('@\bcase\b@i', 1),
	'TK_WHITESPACE'			=> array('@\s+@', 1),
	'TK_UNQUOTED_STRING'		=> array('@\w+@', 1), //Class names, function names, constants (The lexer will deal with this)
	'TK_UNKNOWN'			=> array('@\W@', 1)
	
);

$lex = new lexer(file_get_contents('index.php'));
$lex->addDefinitions($tokenTypes);
$lex->setInertTokens(array(TK_WHITESPACE));
$lex->tokenize();

$tok = $lex->getTokens();
foreach ($tok as $k => $arr)
{
    if ($arr['type'] == TK_COMMENT || $arr['type'] == TK_WHITESPACE)
    {
        unset($tok[$k]);
    }
}

echo '<table cellpadding=3 border=1>
<tr>
    <td><b>Token Type</b></td>
    <td><b>Token</b></td>
    <td><b>Offset</b></td>
</tr>
';

foreach ($tok as $arr)
{
    echo '<tr>
    <td>'.$lex->getTokenName($arr['type']).'</td>
    <td><div style="overflow: hidden; width: 500px;">'.nl2br(htmlentities($arr['token'])).'</div></td>
    <td>'.$arr['offset'].'</td>
</tr>
';
}

echo '</table>';
In the array a "1" means that it's a regex string, and a zero means it's a static string.

EDIT | Updated code as per my updated version. Now approx 15% faster @ 6900 bytes, 70 token defintions. (Main change == using constants/numbers rather than strings)
Post Reply