screen scrape special characters from url

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
Rahul Dev
Forum Newbie
Posts: 18
Joined: Thu Dec 09, 2010 4:54 am

screen scrape special characters from url

Post by Rahul Dev »

Hello guys i have a problem when i screen scrape a piece of text from a url and save it to my db. The text is in french and contains special characters like é. so when i screen scrape it i receive it in this form &eacute. e.g i have a word région in the website but when i screen scrape it, it becomes région. The reason that i want to store it as it is displayed is that i need to perform some operations on the text after saving it in the db as i want.
Is there any way to store the screened scrape text in the form that it is displayed or convert it to the way i want(like this - région)
my code is as follows:

$html = file_get_dom('http://www.defimedia.info/news/8425/Gro ... %99appels-');

foreach($html->find('div[class=PostContent]') as $element)
{
$tags = array('<div class="PostContent">', '<!-- The Adsense will automatically be inserted half way through the content. Applies for both Side and Middle options. -->', '<font face="Georgia">', '<font size="2">', '');
$new_element = str_replace($tags, "", $element);
$sql1 = "UPDATE articles SET original_text = '" . mysql_real_escape_string($new_element) . "' WHERE article_id = '$item_id'";
$result1 = mysql_query($sql1) or die('Query failed: ' . mysql_error());
}
User avatar
AbraCadaver
DevNet Master
Posts: 2572
Joined: Mon Feb 24, 2003 10:12 am
Location: The Republic of Texas
Contact:

Re: screen scrape special characters from url

Post by AbraCadaver »

It is &eacute; in the HTML source of the page you are scraping (check it out). In order to display in a browser it will need to be &eacute; so why do you wan't to translate it? If you must then try html_entity_decode().
mysql_function(): WARNING: This extension is deprecated as of PHP 5.5.0, and will be removed in the future. Instead, the MySQLi or PDO_MySQLextension should be used. See also MySQL: choosing an API guide and related FAQ for more information.
Rahul Dev
Forum Newbie
Posts: 18
Joined: Thu Dec 09, 2010 4:54 am

Re: screen scrape special characters from url

Post by Rahul Dev »

AbraCadaver wrote:It is &eacute; in the HTML source of the page you are scraping (check it out). In order to display in a browser it will need to be &eacute; so why do you wan't to translate it? If you must then try html_entity_decode().
Yes it is &eacute; in the HTML source itself, as i said i need to perform other operations on the text after scraping and storing it in the database. i tried html_entity_decode(), but then the characters become �. any solution to this??
Rahul Dev
Forum Newbie
Posts: 18
Joined: Thu Dec 09, 2010 4:54 am

Re: screen scrape special characters from url

Post by Rahul Dev »

Its ok now i missed something in html_entity_decode(). It should be html_entity_decode($text, ENT_QUOTES, "utf-8");
Thanx for the help
Post Reply