extract image and page number from html

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
ashida123
Forum Newbie
Posts: 7
Joined: Thu Jul 07, 2011 9:26 am

extract image and page number from html

Post by ashida123 »

I need to extract images and corresponding page numbere from php code. please help me .. page number and image is not in the correct order please help me to fix this.

Html code

Code: Select all

<HEAD>
<TITLE></TITLE>
</HEAD>
<BODY>
<A name=1></a><IMG src="Dome-Tome1-StephenKing-1_1.jpg"><br>
<IMG src="Dome-Tome1-StephenKing-3_1.jpg">
<hr>
<A name=2></a>© …ditions Albin Michel, 2011<br>
pour la traduction franÁaise<br>
ISBN : 978-2-226-22437-8<br>
<hr>
<A name=3></a>Aux …ditions Albin Michel<br>
D‘ME, tome 1, 2011<br>
<hr>
<A name=4></a><hr>
<A name=5></a>SEL<br>
<hr>
<A name=6></a>1<br>
Les deux femmes ..... etc
<A name=343></a><IMG src="Dome-Tome1-StephenKing-343_1.jpg"><br>
ressemblait tel ement ‡ etc
</BODY>
</HTML>
PHP CODE

Code: Select all

<?php
$myFile = 'Dome-Tome1-StephenKings.html';

$content = file($myFile);

// how many lines in this file
$numLines = count($content);
//echo $numLines;
// process each line
for ($i = 0; $i < $numLines; $i++) {
// use trim to remove the carriage return and/or line feed character
// at the end of line
$line = trim($content[$i]);
$re="<a\s[^>]*name\s*=\s*(['\"]??)([^'\">]*?)\\1[^>]*>(.*)<\/a>";
preg_match_all("/$re/siU", $line, $matches);
foreach ($matches[2] as $key=>$value1) {
echo $value1."<br>";
}
preg_match_all('/<img .*src=["|\']([^"|\']+)/i', $line, $matches);
foreach ($matches[1] as $key=>$value2) {
echo $value2."<br>";
}
}
?>
OUT PUT

Code: Select all

Dome-Tome1-StephenKing-1_1.jpg
1
Dome-Tome1-StephenKing-3_1
Dome-Tome1-StephenKing-343_1.jpg
343
i need only number and image number .

Needed output

Code: Select all

Dome-Tome1-StephenKing-1_1.jpg
Dome-Tome1-StephenKing-3_1
1
Dome-Tome1-StephenKing-343_1.jpg
343
beetree
Forum Commoner
Posts: 26
Joined: Mon Jul 18, 2011 6:30 pm
Location: Peninsula

Re: extract image and page number from html

Post by beetree »

preg_match_all("/<A name=([^>])>[\s\S]*?src=\"([\s\S]*?)\"/", $data, $match);

this should give you "page numbers" in match[1] and img-urls in match[2]

match[1] = 1, ..., 343
match[2] = Dome-Tome1... .jpg, ..., Dome-Tome1-StephenKing-343_1.jpg

Is this how you wanted it?

Best,
beetree
Post Reply