Page 1 of 1

Get external page contents with what ?

Posted: Sun Dec 12, 2010 10:41 am
by jankidudel
Hi, this problem is in my head for couple of days from now.

What I need to do: get contents from external url page and then somehow parse her divs with javascript.

Is this only ajax can do ? (I can't do this with page_get_contents and then with preg_match_all extract divs, I' s not the way I told to)

Can you help me ?

Re: Get external page contents with what ?

Posted: Sun Dec 12, 2010 12:55 pm
by Darhazer
If you need to parse with JavaScript, do it with JavaScript.
You still can fetch the page with PHP, and output it along with the JS to parse the page content.
Or if parsing with JS is not a must, you can use PHP's XML libraries. There is even http://code.google.com/p/phpquery/

Re: Get external page contents with what ?

Posted: Sun Dec 12, 2010 3:27 pm
by jankidudel
Thank you for your suggestion, but the tags structure of the page I want to parse has many errors and hence working with DOMDocument gives many errors.

Re: Get external page contents with what ?

Posted: Sun Dec 12, 2010 6:36 pm
by Jonah Bron
Have you tried using the DOM with loadHTML()? It doesn't require perfectly formatted HTML.

http://php.net/domdocument.loadhtml

Re: Get external page contents with what ?

Posted: Sun Dec 12, 2010 8:07 pm
by jankidudel
Yes, but either I'm too tired to do this or this is bad implementation of my problem

Code: Select all

<?php
$file = file_get_contents('http://www.google.com/search?hl=en&source=hp&biw=1680&bih=858&q=Michael+Jordan');

$document = new DOMDocument();
$document->loadHTML($file);
$new_document = $document->getElementsByTagName('h3'); // of course Fatal error: Call to undefined method //DOMNodeList::save HTMLFile

$new_document->saveHTMLFile('file.html');

include 'file.html';

?>

Re: Get external page contents with what ?

Posted: Mon Dec 13, 2010 11:31 am
by Jonah Bron
I'm not sure what that code is supposed to do... perhaps you could explain just how you want to parse the data?

Re: Get external page contents with what ?

Posted: Mon Dec 13, 2010 12:13 pm
by jankidudel
Here it is:

I must somehow get all the contents from webpage, and then load this page on another serwer, but modified, not the same as it was. The new page must contain all the things, except to remove all elements with class=' logo' . The visitor must not see old page contents, old page must be parsed to remove all things with class=' logo' before showing the page .

I' ve tried to get file into string and then parse it with regex, but it's too complicated when it comes to nested divs and so on, and i've read about it it's not the way it should be done.

also i've tried to use DOMDocument, but there is another problems here with warnings and my inexperience with working with it , and I throwned this way.

So I've decided to go with jQuery, as I think it's the easiest and most straight-forward way to this problem.

Now I assume you understand what i want to do, thank you very much

Re: Get external page contents with what ?

Posted: Mon Dec 13, 2010 12:20 pm
by Jonah Bron
As I stated in this post: viewtopic.php?t=125684#p637169

Scraping is against Google's Terms of Service. They do have an API though:

http://code.google.com/apis/customsearc ... rview.html

Re: Get external page contents with what ?

Posted: Mon Dec 13, 2010 12:49 pm
by jankidudel
This is another project, not against Google Terms of Service,
This is my website: http://www.programistas.lt/golsta/site1

from here I want to be able to remove some parts of classes as i wish to show to the people, and I don't want to change main site content

for example here it is class:
<table class="contentpaneopen">

I want not to show this

Re: Get external page contents with what ?

Posted: Mon Dec 13, 2010 1:52 pm
by Jonah Bron
All you need to do is to go through all nodes in the DOMDocument, search for DOMElements who's class attribute has the classes you don't want to show, and remove them. Here's a good place to start:

http://php.net/dom