Get external page contents with what ?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
jankidudel
Forum Commoner
Posts: 91
Joined: Sat Oct 16, 2010 4:30 pm
Location: Lithuania, Vilnius

Get external page contents with what ?

Post by jankidudel »

Hi, this problem is in my head for couple of days from now.

What I need to do: get contents from external url page and then somehow parse her divs with javascript.

Is this only ajax can do ? (I can't do this with page_get_contents and then with preg_match_all extract divs, I' s not the way I told to)

Can you help me ?
User avatar
Darhazer
DevNet Resident
Posts: 1011
Joined: Thu May 14, 2009 3:00 pm
Location: HellCity, Bulgaria

Re: Get external page contents with what ?

Post by Darhazer »

If you need to parse with JavaScript, do it with JavaScript.
You still can fetch the page with PHP, and output it along with the JS to parse the page content.
Or if parsing with JS is not a must, you can use PHP's XML libraries. There is even http://code.google.com/p/phpquery/
jankidudel
Forum Commoner
Posts: 91
Joined: Sat Oct 16, 2010 4:30 pm
Location: Lithuania, Vilnius

Re: Get external page contents with what ?

Post by jankidudel »

Thank you for your suggestion, but the tags structure of the page I want to parse has many errors and hence working with DOMDocument gives many errors.
User avatar
Jonah Bron
DevNet Master
Posts: 2764
Joined: Thu Mar 15, 2007 6:28 pm
Location: Redding, California

Re: Get external page contents with what ?

Post by Jonah Bron »

Have you tried using the DOM with loadHTML()? It doesn't require perfectly formatted HTML.

http://php.net/domdocument.loadhtml
jankidudel
Forum Commoner
Posts: 91
Joined: Sat Oct 16, 2010 4:30 pm
Location: Lithuania, Vilnius

Re: Get external page contents with what ?

Post by jankidudel »

Yes, but either I'm too tired to do this or this is bad implementation of my problem

Code: Select all

<?php
$file = file_get_contents('http://www.google.com/search?hl=en&source=hp&biw=1680&bih=858&q=Michael+Jordan');

$document = new DOMDocument();
$document->loadHTML($file);
$new_document = $document->getElementsByTagName('h3'); // of course Fatal error: Call to undefined method //DOMNodeList::save HTMLFile

$new_document->saveHTMLFile('file.html');

include 'file.html';

?>
User avatar
Jonah Bron
DevNet Master
Posts: 2764
Joined: Thu Mar 15, 2007 6:28 pm
Location: Redding, California

Re: Get external page contents with what ?

Post by Jonah Bron »

I'm not sure what that code is supposed to do... perhaps you could explain just how you want to parse the data?
jankidudel
Forum Commoner
Posts: 91
Joined: Sat Oct 16, 2010 4:30 pm
Location: Lithuania, Vilnius

Re: Get external page contents with what ?

Post by jankidudel »

Here it is:

I must somehow get all the contents from webpage, and then load this page on another serwer, but modified, not the same as it was. The new page must contain all the things, except to remove all elements with class=' logo' . The visitor must not see old page contents, old page must be parsed to remove all things with class=' logo' before showing the page .

I' ve tried to get file into string and then parse it with regex, but it's too complicated when it comes to nested divs and so on, and i've read about it it's not the way it should be done.

also i've tried to use DOMDocument, but there is another problems here with warnings and my inexperience with working with it , and I throwned this way.

So I've decided to go with jQuery, as I think it's the easiest and most straight-forward way to this problem.

Now I assume you understand what i want to do, thank you very much
User avatar
Jonah Bron
DevNet Master
Posts: 2764
Joined: Thu Mar 15, 2007 6:28 pm
Location: Redding, California

Re: Get external page contents with what ?

Post by Jonah Bron »

As I stated in this post: viewtopic.php?t=125684#p637169

Scraping is against Google's Terms of Service. They do have an API though:

http://code.google.com/apis/customsearc ... rview.html
jankidudel
Forum Commoner
Posts: 91
Joined: Sat Oct 16, 2010 4:30 pm
Location: Lithuania, Vilnius

Re: Get external page contents with what ?

Post by jankidudel »

This is another project, not against Google Terms of Service,
This is my website: http://www.programistas.lt/golsta/site1

from here I want to be able to remove some parts of classes as i wish to show to the people, and I don't want to change main site content

for example here it is class:
<table class="contentpaneopen">

I want not to show this
User avatar
Jonah Bron
DevNet Master
Posts: 2764
Joined: Thu Mar 15, 2007 6:28 pm
Location: Redding, California

Re: Get external page contents with what ?

Post by Jonah Bron »

All you need to do is to go through all nodes in the DOMDocument, search for DOMElements who's class attribute has the classes you don't want to show, and remove them. Here's a good place to start:

http://php.net/dom
Post Reply