read from html file using php

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
ashida123
Forum Newbie
Posts: 7
Joined: Thu Jul 07, 2011 9:26 am

read from html file using php

Post by ashida123 »

I need to extract the image and corresponding page number using php . i used regular expression and got the image name . but i need to get the number befor that eg 1 and 343

my code
$myFile = 'Dome-Tome1-StephenKings.html';

$content = file($myFile);

// how many lines in this file
$numLines = count($content);
echo $numLines;
// process each line
for ($i = 0; $i < $numLines; $i++) {
// use trim to remove the carriage return and/or line feed character
// at the end of line
$line = trim($content[$i]);
preg_match_all('/<img .*src=["|\']([^"|\']+)/i', $line, $matches);
foreach ($matches[1] as $key=>$value) {
echo $value."<br>";
}
}

output
-----------
Dome-Tome1-StephenKing-1_1.jpg
Dome-Tome1-StephenKing-343_1.jpg

I need this output
--------------------
1
Dome-Tome1-StephenKing-1_1.jpg
343
Dome-Tome1-StephenKing-343_1.jpg


somebody please help

Html code
----------------------------
<HEAD>
<TITLE></TITLE>
</HEAD>
<BODY>
<A name=1></a><IMG src="Dome-Tome1-StephenKing-1_1.jpg"><br>
<hr>
<A name=2></a>© …ditions Albin Michel, 2011<br>
pour la traduction franÁaise<br>
ISBN : 978-2-226-22437-8<br>
<hr>
<A name=3></a>Aux …ditions Albin Michel<br>
D‘ME, tome 1, 2011<br>
<hr>
<A name=4></a><hr>
<A name=5></a>SEL<br>
<hr>
<A name=6></a>1<br>
Les deux femmes ..... etc
<A name=343></a><IMG src="Dome-Tome1-StephenKing-343_1.jpg"><br>
ressemblait tel ement ‡ etc
</BODY>
</HTML>
Last edited by ashida123 on Thu Jul 14, 2011 9:58 am, edited 1 time in total.
User avatar
social_experiment
DevNet Master
Posts: 2793
Joined: Sun Feb 15, 2009 11:08 am
Location: .za

Re: read from html file using php

Post by social_experiment »

Code: Select all

<?php
// looks for a number or numbers after a 
// hypen with an underscore behind it
$pattern1 = '/-\d+_/';
#
$pattern2 = '/\d+/';
?>
Pattern1 will return what you are looking for but you will have to run the results against another pattern (pattern2) to retrieve the number only.
“Don’t worry if it doesn’t work right. If everything did, you’d be out of a job.” - Mosher’s Law of Software Engineering
ashida123
Forum Newbie
Posts: 7
Joined: Thu Jul 07, 2011 9:26 am

Re: read from html file using php

Post by ashida123 »

i tried this code

<?php
$myFile = 'Dome-Tome1-StephenKings.html';

$content = file($myFile);

// how many lines in this file
$numLines = count($content);
//echo $numLines;
// process each line
for ($i = 0; $i < $numLines; $i++) {
// use trim to remove the carriage return and/or line feed character
// at the end of line
$line = trim($content[$i]);
$re="<a\s[^>]*name\s*=\s*(['\"]??)([^'\">]*?)\\1[^>]*>(.*)<\/a>";
preg_match_all("/$re/siU", $line, $matches);
foreach ($matches[2] as $key=>$value) {
echo $value."<br>";
}
preg_match_all('/<img .*src=["|\']([^"|\']+)/i', $line, $matches);
foreach ($matches[1] as $key=>$value) {
echo $value."<br>";
}
}
?>

output
------------
1
Dome-Tome1-StephenKing-1_1.jpg
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
342
343
Dome-Tome1-StephenKing-343_1.jpg
344
345

i need only number and image number . i need to omit other page number
needed out put in an array
-------------------------
1
Dome-Tome1-StephenKing-1_1.jpg

343
Dome-Tome1-StephenKing-343_1.jpg
User avatar
social_experiment
DevNet Master
Posts: 2793
Joined: Sun Feb 15, 2009 11:08 am
Location: .za

Re: read from html file using php

Post by social_experiment »

Code: Select all

<?php
$pattern = '/-\d+_/';

preg_match_all($pattern, $content, $array);

echo '<pre>';
print_r($array);
echo '<pre>';


foreach ($array[0] as $key => $value)
{		
		$pattern = '/\d+/';
		preg_match_all($pattern, $value, $arg);
		echo '<pre>';
		#print_r($arg);
		echo $arg[0][0];
		echo '</pre>';			
}
?>
If the string is tested against the first pattern only numbers starting with a hyphen and with an underscore behind it. This is because the data already contains other numbers and numbers with hyphens. -number_ is the pattern to look for. Once that is found, simply check for numbers within the found results.
“Don’t worry if it doesn’t work right. If everything did, you’d be out of a job.” - Mosher’s Law of Software Engineering
ashida123
Forum Newbie
Posts: 7
Joined: Thu Jul 07, 2011 9:26 am

Re: read from html file using php

Post by ashida123 »

thank you...
Post Reply