minor changes & tailoring of a regex-allready works fine!
Posted: Sat Dec 18, 2010 12:01 pm
Good day dear community.
first of all - this is a great place to be. I have learned alot of this forums! Many many thanks for running such a great place. It is a true place for me to get new insights into a great technique. I have learned alot!
Whats up today:
I need to build a function which parses the domain from a url. I have used various ways to parse html sources. But this one is is a bit tricky! See the target i want to parse - it has some invaild Markup:
http://www.schulministerium.nrw.de/BP/S ... pDO=194190
well what do you think - can i apply this code here
well i have to rework the parser-part of this script. I need to parse somway different - since i have other site here.
Can anybody help me here to get a better regex - or a better way to parse this site ...
Any and all help will be greatly apprecaited.
regards
lin
first of all - this is a great place to be. I have learned alot of this forums! Many many thanks for running such a great place. It is a true place for me to get new insights into a great technique. I have learned alot!
Whats up today:
I need to build a function which parses the domain from a url. I have used various ways to parse html sources. But this one is is a bit tricky! See the target i want to parse - it has some invaild Markup:
http://www.schulministerium.nrw.de/BP/S ... pDO=194190
well what do you think - can i apply this code here
Code: Select all
<?php
require_once('config.php'); // call config.php for db connection
$filename = "url.txt"; // Include the txt file which have urls
$each_line = file($filename);
foreach($each_line as $line_num => $line)
{
$line = trim($line);
$content = file_get_contents($line);
//echo ($content)."<br>";
$pattern = '/<td>(.*?)<\/td>/si';
preg_match_all($pattern,$content,$matches);
foreach ($matches[1] as $match) {
$match = strip_tags($match);
$match = trim($match);
//var_dump($match);
$sql = mysqli_query("insert into tablename(contents) values ('$match')");
//echo $match;
}
}
?>Can anybody help me here to get a better regex - or a better way to parse this site ...
Any and all help will be greatly apprecaited.
regards
lin