universal website crawler using PHP

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
adnanahamd
Forum Newbie
Posts: 1
Joined: Wed Jan 01, 2014 11:38 pm

universal website crawler using PHP

Post by adnanahamd »

Hi folks

I want to universal website crawler using PHP, so my crawler will work on any given site.
By using my web application, user will input any site, will provide input, what he needs to get from given site and will click on Start button. Then my web application will begin to get data from source website.
I am using iframe for this purpose , load page in iframe and using jquery I get class and tags name of spacific area from user. But when I load external website like ebay or amazon etc it does not work, as these site are restricted.
Is there any way to resolve this issue, so I can load any site in iFrame. If there is any alternative to what I want to achieve.
I am actually inspired with mozenda, a software developed in .NET, http://www.mozenda.com/video01-overview/. They load a site in browser control and almost doing same thing.
Please help me on this!!

Thank you
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: universal website crawler using PHP

Post by Celauran »

jquery ... external website
And there's your problem. Same-origin policy
NoraChoi
Forum Newbie
Posts: 1
Joined: Thu Jun 16, 2016 4:25 am

Re: universal website crawler using PHP

Post by NoraChoi »

I think other web scrapers like import.io http://www.import.io (web-based extractor) and Octoparse http://www.octoparse.com may interest you and give some insights as well. :wink:
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: universal website crawler using PHP

Post by Christopher »

Perhaps you could fetch the site with cURL, read the site HTML and then put the HTML in a iframe you have access to.
(#10850)
Post Reply