This is my first post. I hope you guys can help me out because I've been stuck on this small script for months. I like to figure stuff out on my own but I just can't figure this one you so it's time to ask for some help. Anway here's the problem.
I have a php scraping script that scraps a page. It has to grab a section of urls, links and anchors. Everything works except for the fact that the Anchors a lot of the times have invalid characters. When I put them in the database and grab them later they show up as invalid characters and mess up my RSS feed.
So what I need to do to nip this problem in the bud, is replace the characters before it goes into the database. I've researched this for days and found just about everything that's out there and tried everything that's out there I just can't it to work.
Here's the list of characters I need to replace:
'‘’``—”€“éó – –á
Here is my entire scraping script:
Code: Select all
$DB = mysql_connect('blah', 'blah','blah') or die (mysql_error());
mysql_select_db('blah', $DB);
$Base = 'http://www.website.com';
$data = file_get_contents($Base);
$regexDesc = '/(">).*(<\/a)/';
$regexURL = '/https?:\/\/.*target/';
$regex = '/(<a [^>]+>)(.*?)<\/a>.+br><span.+class.+small.+<\/span>/';
preg_match_all($regex,$data,$match);
$match = $match[0];
$NewsList = array();
foreach($match as $Page) {
$NewsList[] = $Page;
}
foreach($NewsList as $story) {
preg_match_all($regexDesc,$story,$matchDesc);
$matchDesc = $matchDesc[0];
foreach($matchDesc as $Description) {
$Description = substr($Description,2,-3);
$anchor = mysql_real_escape_string($Description);
echo $anchor." <br>";
}
preg_match_all($regexURL,$story,$matchURL);
$matchURL = $matchURL[0];
foreach($matchURL as $URL) {
$URL = substr($URL,0,-8);
$url = mysql_real_escape_string($URL);
echo $url." <br>";
}
$date = date('l jS \of F Y');
$time = time();
$sql="INSERT INTO table (date, time, url, anchor) VALUES ('$date','$time','$url','$anchor')";
$result = mysql_query($sql, $DB) or die (mysql_error());
}
echo "--> Inserted Values Successfully";
mysql_close($DB);