get just a part from one web

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
webreake
Forum Newbie
Posts: 4
Joined: Wed Sep 07, 2005 11:52 pm
Location: mx

get just a part from one web

Post by webreake »

Hello

So i visited a web with a list of available jobs there was too much info in this page (4 mb plain text) and the webmaster of this page shows everything in just one page this represent a problem for the users of the page because the page takes too long to complete its loading

Thas why i want to download only a portion of that big page or try to get filtered content with php

This is the url:

http://clasificados.mexplaza.com.mx/cgi ... mpleos.cgi

Any ideas ?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Code: Select all

[feyd@home]>php -r "preg_match_all('#<form.*?</form>.*?</blockquote>#s',file_get_contents('http://clasificados.mexplaza.com.mx/cgi-bin/clasificados/listarempleos.cgi'),$matches); echo count($matches[0]);"
3607
there are 3607 job entries, as of 5 minutes ago. :)
webreake
Forum Newbie
Posts: 4
Joined: Wed Sep 07, 2005 11:52 pm
Location: mx

Post by webreake »

thanks feyd
locally works great but when i tried to run this script in my web i got an error because it is to much time to download the file(more than 30 seconds ) also i think this is known as steal bandwith
i guess thats why i got that error in my web

i saw my bandwidth usage and still downloading the entire site,
:(
So my question is:
will be possible to get filtered content from one web without downloading the entire html (for example download only the red font or only the links from the web without everything else)??
User avatar
raghavan20
DevNet Resident
Posts: 1451
Joined: Sat Jun 11, 2005 6:57 am
Location: London, UK
Contact:

Post by raghavan20 »

I dont know whether this function would be anyway useful

ignore_user_abort()

http://uk2.php.net/manual/en/function.i ... -abort.php
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

could use set_time_limit() as well or alternately...
webreake
Forum Newbie
Posts: 4
Joined: Wed Sep 07, 2005 11:52 pm
Location: mx

Post by webreake »

hi again

i tried with both functions :
ignore_user_abort()
set_time_limit()

but now my problem is how to force this function to return the array when i finish the script:

Code: Select all

preg_match_all('#<form.*?</form>.*?</blockquote>#s',file_get_contents('http://clasificados.mexplaza.com.mx/cgi-bin/clasificados/listarempleos.cgi'),$matches);
or maybe i will look for a function to capture a number of bytes and print them, wich bring me into another problem.
this is cool
programming is like a really big puzzle :o
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

webreake wrote:programming is like a really big puzzle :o
yes, it is like a really big puzzle. :)

$matches in that code will contain all the found parts.
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

I would cache the actual page.. So, every 5 minutes or so you retrieve a new version (Might want to use the conditional GET for that). And then, make your own site work on that cache. This will improve performance (at the cost of an average 2.5minutes delay in changes)
webreake
Forum Newbie
Posts: 4
Joined: Wed Sep 07, 2005 11:52 pm
Location: mx

Post by webreake »

hi
this is my happy end :)
finally finished my script
what it does is what timvw suggest
I would cache the actual page.. So, every 5 minutes or so you retrieve a new version
combined with the function preg_match_all to cache the page,
also i added a function to delete repeated job entries

thanks guys
Post Reply