Page 1 of 1

Splitting parts of a String in Variables

Posted: Wed Oct 13, 2010 8:33 am
by HoboJoey
Hello there,

I have a file full of list elements like that:

Code: Select all

<li><a href="http://website.com/?somestuff">The Link Name</a> </li>
Now I want to open the file, read the first line and store the "http://website.com/?somestuff" in a variable. The same should happen with "The Link Name" - it should be in another variable. The the script should read the next line and change the variables with the link and the name of the link in the next line. And so on.

This is what I tried:

Code: Select all

<?php
error_reporting(E_ALL);
ini_set('display_errors', 1);
$fd = fopen("list","r");
$i='0';
while (!feof($fd)) {
	$z = fgetss($fd,1000,"<a>");
	echo $z;
}
?>
Now I somehow have to get both parts into variables. I think I have to use regex expressions but I can't exactly figure out what to do.

Thanks in advance,

HoboJoey

Re: Splitting parts of a String in Variables

Posted: Wed Oct 13, 2010 9:43 am
by HoboJoey
Solved it like that:

Code: Select all

<?php
error_reporting(E_ALL);
ini_set('display_errors', 1);
$fd = fopen("list","r");
while (!feof($fd)) {
	$zeile = fgetss($fd,1000,"<a>");
	//echo $zeile;
	echo "<br>";
	$pos1 = strpos($zeile, "http://");
	echo $pos1;
	echo "<br>";
	$pos2 = strpos($zeile, "\">");
	echo $pos2;
	echo "<br>";
	$pos2-=$pos1;
	
	$subString = substr ($zeile, $pos1, $pos2);

	echo "subString = $subString <br>";
}
?>

Re: Splitting parts of a String in Variables

Posted: Wed Oct 13, 2010 9:51 am
by AbraCadaver
Try DOMDocument:

Code: Select all

$doc = new DOMDocument();
$doc->loadHTMLFile("list");

foreach($doc->getElementsByTagName('a') as $link) {
	$links[] = array('url' => $link->getAttribute('href'), 'text' => $link->nodeValue);
}

Re: Splitting parts of a String in Variables

Posted: Fri Oct 22, 2010 9:37 am
by twinedev
I like the DOMDocument method, I'm going to go read up on that some. Here is a solution using the regex method:

Code: Select all

<?php

	// Get the file into this variable, I just set it to grab CNN for testing...
	$strFile = file_get_contents("http://www.cnn.com/");

	//                           _1__  _2_        _3_
	preg_match_all('%<a .*?href=("|\')(.*?)\1.*?>(.*?)</a>%i', $strFile, $aryMatch, PREG_PATTERN_ORDER);

	// in $aryMatch:
	//   [0] is array of all complete matches
	//   [1] is array of the opening quote, either single or double, so it can match the closing
	//   [2] is array of the actual URL of the link
	//   [3] is array of the text for the link

	if (isset($aryMatch[2]) && count($aryMatch[2]>0)) {
		foreach ($aryMatch[2] as $key=>$strURL) {
			$strLinkText = $aryMatch[3][$key]; // Added this for easier readability
			echo ($key+1),': ';
			if (preg_match('/^javascript:/i',$strURL)) {
				echo "<strong><em>Javascript Call</em></strong><br>\n";
			}
			else {
				echo htmlspecialchars($strLinkText),'<strong> LINKS TO </strong>'.$strURL,"<br>\n";
			}
		}
	}
	else {
		echo "Sorry, no links found...";
	}

?>
A note before you copy and paste that, the editor here kept changing the code on me, the line that has the preg_match for javascript, it is actually supposed to be /^javascript:/i in there.

Another item you may want to consider, depending on your use of the data, is check to see if a link starts with #, which it just to link to an anchor on the same page. If it is, change it from #whatever to be /path/to/file#whatever

-Greg