Page 1 of 1
Getting a website source
Posted: Thu Sep 01, 2011 3:29 pm
by Nofatigue
Hi
I've been trying to get the source code of websites using PHP code, But every method I used was slow (it took about 15-20).
I used fopen (with stream_get_contents instead of fread), file_get_contents and curl but it was the same result every time.
Is there any faster way to get a website's source code?
thanks
Re: Getting a website source
Posted: Thu Sep 01, 2011 7:04 pm
by twinedev
15-20 what? seconds? For each page, for whole site? How long does it take a browser to fully load the page you are trying to get?
Re: Getting a website source
Posted: Thu Sep 01, 2011 7:22 pm
by xtiano77
I could be mistaken, but are you asking for someone to tell you how to get the source code from a ".php" document hosted on a server other than your own or one that you have access to? I don't believe that is possible without having access to the web server. As far as I know if you try to get the contents of a ".php" file using teh methods described above, all you would get is the HTML produced by the page. Again, I could be mistaken on this so I'll yield to the more senior and experienced coders in this forum.
Re: Getting a website source
Posted: Fri Sep 02, 2011 3:17 am
by phphelpme
It all depends on where you are running your code that extracts the code you want, and what server the code you want gets executed on.
Plus it depends on the connection speeds, how many users are online/visitors online etc.
That does seem very slow and I hope you are talking seconds because if not thats really really slow.
I use cURL all the time and it runs very very fast so I would point it at the server your code is on if you can load the site page no problem in your browser.
@xtiano77 You are right about not being able to grab the PHP coding as it is server side coding and only the output HTML/CSS will be available to scrape/grab etc. The only way to get the files php code is to have access to the file itself or there is a server misconfiguration and it displays the php instead of executing it.
Best wishes
Re: Getting a website source
Posted: Fri Sep 02, 2011 5:33 am
by Nofatigue
Yes, i mean to get the source with the HTML code. And 15-20 seconds to one page. I dont think it's my connetcion, because getting the source using Chrome for example, takes like 2 - 3 seconds. I read somewhere in the internet that file_get_contents function doesnt tell the server to close the connection after you get the source, so the server close it after a specific amount of time (he said 15 seconds). Also he said that this code:
Code: Select all
$context = stream_context_create(array('http' => array('header'=>'Connection: close')));
file_get_contents("http://www.something.com/somepage.html",false,$context);
Should solve it, but when using it there's an error opening the steam. Anyone knows more about it?
Btw if you want to see what that guy says:
http://stackoverflow.com/questions/3629 ... g-full-url (you have to scroll down)
Re: Getting a website source
Posted: Fri Sep 02, 2011 9:07 am
by phphelpme
I suppose it depends on what content you are trying to get your hands on.
cURL is great and fast but it has the draw back of not being able to grab certain content it just grabs everything at once and you sift through it all to scrape your desired content.
So you say it happens to take 15-20 even using cURL?
Now, you say when using a different browser it takes less time? maybe you should check your addons and make sure they are updated, functioning correctly etc.
Using cURL you can end the connection easily then you are just dealing with a string value which can be passed around anywhere you want. I would use cURL as it is far more dynamic than its competitors.
Best wishes