Page 1 of 1

Returning the text within tags

Posted: Tue Jul 01, 2003 11:44 pm
by LittleZephyr
Hi, I'm writing a blog application in PHP, and I'm having some problems with getting info from the text files that the entries are saved in.

The I have the text formatted, each field of info is suyrrounded byt descriptive tags (Kinda like XML)

So a typical text file would be like:

Code: Select all

<title>This is the Entry's Title</title>
<date>Day Year Month</date>
<text>blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah</text>
Is there a way to search for a specific set of tags, and return only the stuff in between it to a string? It seems you could use something like the strstr() function to do something like this, but I'm not sure. Any help would be apprecaited.

Posted: Tue Jul 01, 2003 11:55 pm
by phice

Code: Select all

<?php
function search_display($tag)
{
// $search = the whole contents (including all <> tags
$search = ereg("<{$tag}>(.*)</{$tag}>", $search, $print); 
echo $print[1]; 
}


// Use:
search_display("title"); // Will return "This is the Entry's Title"
search_display("date"); // Will return "Day Year Month"
search_display("text"); // Will return "blah blah blah ..."
?>
Tell me if you have any problems. :)

Posted: Wed Jul 02, 2003 12:34 am
by m3rajk
http://us3.php.net/manual/en/ref.pcre.php
http://us3.php.net/manual/en/function.preg-match.php
http://us3.php.net/manual/en/function.p ... ch-all.php
http://us3.php.net/manual/en/function.preg-grep.php

those should help

all things considered... http://us3.php.net/manual/en/function.preg-split.php might help too

assuming it's a file with many entries...
<start entry><date><user><entry text></end entry>

and that repeats then....

Code: Select all

$entries=preg_split('|<entry delimiter tag>(?U)(.*)</entry delimiter tag>|', $file, -1, PREG_DELIM_CAPTURE);
will make an array of the individual entires. the first one is a blank

you can then parse each entry:

Code: Select all

$dates=array() $numentires=count($entries);
for($i=0;$i<numentires;$i++){
  $dates[] = preg_match('|<date delimiter>(.*)</date delimiter>|', $entries[$i]);
}
and so on until you have the parts in a set of arrays, and you know the size of the arrays ($numentries) so you can display them in the setting using a for loop and echo "<table><tr><td>$date</td><td>$author</td><td>$title</td></tr><tr><td colspan=\"3\">$entry</td></tr></table>";

Posted: Sun Jul 06, 2003 2:18 am
by LittleZephyr
Thanks Everyone, This was a great help ^__________________________^

Posted: Mon Jul 07, 2003 5:50 pm
by LittleZephyr
Sorry to resurect a dead topic, but I figured this was the better place to do it instead of making a new topic. I've tried both methods and neither worked. Here's my current code:

Code: Select all

<?php

//GET ENTRIES TO ARRAY

$arrayed_datafile = file($datafile); // Puts text from file into array

print("<hr><pre>ARRAYED_DATAFILE");
print_r($arrayed_datafile); // Copius Debug
print("</pre><hr>");

$filetext = implode("\n", $arrayed_datafile);  // Assembles area into a string

print("<hr><pre>FILETEXT POST IMPLODE");
print("$filetext"); // Copius Debug
print("</pre><hr>");

$filetext = ereg("<{entry}>(.*)</{entry}>", $filetext, $print); // Searches  filetext and sperates it's into entries. THIS IS WHAT DOESN'T WORK SO FAR

print("<hr><pre>PRINT");
print_r($print); // Copius Debug
print("</pre><hr>");

print("<hr><pre>FILETEXT POST EREG");
print_r($filetext); // Copius Debug
print("</pre><hr>");

//END GET ENTRIES TO ARRAY
//GET FIELDS

$numentries = count($entries);
print("$numentries"); // Copius Debug

$dates = array();

for ( $i =0 ; $i < $numentires; $i++) {
  $titles[] = preg_match('|<name>(.*)</name>|', $entries[$i]);
}
for ( $i = 0; $i < $numentires; $i++) {
  $dates[] = preg_match('|<datim>(.*)</datim>|', $entries[$i]);
}
for ( $i = 0; $i < $numentires; $i++) {
  $mainbodies[] = preg_match('|<mainbody>(.*)</mainbody>|', $entries[$i]);
}

// END GET FIELDS
// DISPLAY POSTS

for ( $i = 0; $i < $numentries; $i++) {
  print ("<hr><center><table width="600" border="1"><tr><td>$titles[$i]</td></tr><tr><td>$dates[$i]</td></tr><tr><td>$mainbodies[$i]</td></tr></table></center>\n");
}

// END DISPLAY POSTS

?>
I've also tried using

Code: Select all

$entries=preg_split('|<entry>(?U)(.*)</entry>|', $file, -1, PREG_DELIM_CAPTURE);
But it doesn't work either, can anyone please help be figure out why the script is breaking?

Posted: Mon Jul 07, 2003 6:02 pm
by bionicdonkey
this is a very old script i wrote. i think it might help

Code: Select all

<?
//======================================================================
// Bionic Donkey Research Facility Online :: require/images.php
//
// Copyright (C) 2002-2003 Josh BenoƮt. All rights reserved. 
// 	This program is free software licensed under the 
// 	GNU General Public License (GPL).
//
// This code may be redistributed as long as this text is present.
// This text may be edited but only to state addition made the script.
//
// Bionic Donkey Research
// http://bionicdonkey.host.sk
//
//=====================================================================

//---Change HTML tags to text---\\
function striptags($input) {
	$badcode = array("IMG ","SCRIPT","/SCRIPT", "FONT", "/FONT"); // List of HTML tags to find
	$inputsave = $input;
	$output = "";
	$end = 0;
	while(1) {
		// find the opening tag 
		$s = strpos($input,"<");
		if($s === false) {
			// no more input to process 
			$output .= $input;
			break;
		} else {
			// copy up to $pos 
			$output .= substr($input,0,$s);
			// is this followed by a tag we want? 
			$found = 0;
			for($i=0;$i<count($badcode);$i++) {
				$tmp = substr($input,$s + 1,strlen($badcode[$i]));
				if(strcasecmp($tmp,$badcode[$i]) == 0) {
					// matched 
					$found = 1;
					// escaped start 
					$output .= "<";
					$e = strpos($input,">",$s);
					if($e === false) {
						// no closing tag, copy the rest 
						$output .= substr($input,$s + 1);
						$end = 1;
						break;
					} else {
						$output .= substr($input,$s + 1,$e - $s - 1).">";
						$input = substr($input,$e + 1);
					}
				} 
			}
			if(!$found) {
				// we didn't find anything, walk past the tag 
				$e = strpos($input,">",$s);
				if($e === false) {
					// no closing tag, copy the rest 
					$output .= substr($input,$s);
					$end = 1;
					break;
				} else {
					$output .= substr($input,$s,$e - $s);
					$input = substr($input,$e);
				}
			}
			if($end)
				break;
		}
	}
	return $output;
}
	?>

Posted: Mon Jul 07, 2003 6:11 pm
by m3rajk
in the first, i'm pretty sure that { is a special charater, so { and } should be \{ and \} respectively

in both of them, what i found was that the first entry was an empty string if there was nothing before the first delimiter.

here's the function i made to use for myself.. maybe we should send it to php.net for inclusion in version 5 to return an array of substrings...

Code: Select all

</php
function get_sub_pattern($pattern, $inputstring){
  $stage1=preg_split($pattern, $inputstring, -1, PREG_SPLIT_DELIM_CAPTURE);
  $stage2=array();
  $blocks=count($stage1);
  for($i=0;$i<$blocks;$i++){
    if(!(is_long($i/2))){
      $stage2[]=$stage1[$i];
    }
  }
  return $stage2;
}
?>
i should note that i was looking at block of text with the tags [nocode][/nocode] in there and i wanted what was between them, to make it exempt from parsing, so i needed every other line. i think you need stage1... but a mod that should help would be....

Code: Select all

</php
function get_sub_pattern($pattern, $inputstring){
  $stage1=preg_split($pattern, $inputstring, -1, PREG_SPLIT_DELIM_CAPTURE);
  $stage2=array();
  $blocks=count($stage1);
  for($i=0;$i<$blocks;$i++){
    if($i>0){
      $stage2[]=$stage1[$i];
    }
  }
  return $stage2;
}
?>
i called it like this:

$codeexempt=get_sub_pattern('|\[nocode](?U)(.*)\[/nocode]|i', $precode);

i didn't check, but did you include the i at the end?
without the i, you're looking for :
<entry>(U?)(.*)</entry>

with the i, you'll match <Entry>, <eNtry>, <enTry>, <entRy>, <ENTry> and any other variation you can think of. the i does case insensitivity... might actually help

Posted: Mon Jul 07, 2003 6:50 pm
by McGruff
I think tjhis is it (my regex is pretty shaky):

preg_match_all("#<tag>(.*?)</tag>#si", $string, $matches);

Strings should be in $matches[1].

Do a separate pass for each tag.

Posted: Tue Aug 05, 2003 5:13 pm
by discostu
I can't figure out how to get this working over multiple lines.

Code: Select all

preg_match_all ("/(<([\w]+)[^>]*>)(.*)(<\/\\2>)/",$contents,$matches);
Because the '.' restricts to all characters but newline where as I want all characters including newline. I tried changing it to (.*|\n*) (for OR NEWLINE), but that didn't work. I guess i'm just too inexperienced with regex.

Thanks! :D

Posted: Tue Aug 05, 2003 6:08 pm
by McGruff
This is a useful tool for testing expressions:

http://www.weitz.de/regex-coach/#install

Posted: Tue Aug 05, 2003 6:30 pm
by patrikG
discostu wrote:I can't figure out how to get this working over multiple lines.

Code: Select all

preg_match_all ("/(<([\w]+)[^>]*>)(.*)(<\/\\2>)/",$contents,$matches);
Because the '.' restricts to all characters but newline where as I want all characters including newline. I tried changing it to (.*|\n*) (for OR NEWLINE), but that didn't work. I guess i'm just too inexperienced with regex.

Thanks! :D
use /s - it sets regex to include linebreaks, but somehow this parameter doesn't always work reliably. Hence also include \r\n in your regex - \n is not recognised as a proper linebreak by PHP's regex parser.

Code: Select all

preg_match_all ("/(<([\w]+)[^>]*>)(.*[\r\n]*)(<\/\\2>)/s",$contents,$matches);