Page 1 of 1
PHP Web Crawler
Posted: Mon Jun 05, 2006 1:53 am
by Joeiscoolone
So were would you start in making a PHP web crawler what would be some of the kind of code you would need to write to access a page and start crawling what kind of code would you use to specify how you want pages to be ranked?
Re: PHP Web Crawler
Posted: Mon Jun 05, 2006 2:23 am
by RobertGonzalez
Joeiscoolone wrote:So were would you start in making a PHP web crawler what would be some of the kind of code you would need to write to access a page and start crawling what kind of code would you use to specify how you want pages to be ranked?
That is the longest single string of three questions I have ever seen

. Let's try it like this...
Joeiscoolone wrote:1. So were would you start in making a PHP web crawler?
2. What would be some of the kind of code you would need to write to access a page and start crawling?
3. What kind of code would you use to specify how you want pages to be ranked?
1. I would guess you would want to at least take a look at
cURL.
2. Search these forums for 'cURL'.
3. You really need to consider developing your standard of ranking. Then ask that question. It just seems to make sense that you would have a process then code around the process. Asking the community might lead to varying ideas on process, standards, etc. Not that this is bad, but it makes it look like you want the community to develop that standard for you.
What have you gotten done with your Crawler?
First off
Posted: Mon Jun 05, 2006 2:32 am
by Joeiscoolone
First off I dont want the community to develope the standard for me secondly you are very rude if your going to be rude do not answer my qeustion. What kind of code you would use to rank not an entirly built search engine, read posts carfully befor you post and be polite when suggesting to someone that there string is to long. The proper way of doing this would have been, hey just a suggetion but when you post try to not make your string so long break it up a bit. Not the rude way you did it.
Posted: Mon Jun 05, 2006 2:50 am
by onion2k
One thing to bear in mind with cURL is that there are two approachs to using it since PHP 5 .. you'll want to use the new curl_multi_init() function as you'll be getting a lot of pages rather than just a few.
Everah's third point regarding ranking pages is spot on. There simply isn't any way to suggest a method of coding a ranking system until we know what you intend to use to rank pages. You need to develop the algorithm before you even start to think of the code. I could probably think of dozens of ways to index web content .. and each would require a different approach to writing the code to do them .. so asking for suggestions without telling us how your system will work makes suggesting anything very tricky indeed.
Don't take this the wrong way, but you do seem to be attempting to write a crawler without knowing all that much about how to begin. You're trying to run before you can walk .. writing an application to crawl and index the web is exceedingly complex .. probably one of the most complicated jobs in web development. If you're asking basic questions about the fundamental aspects then I would suggest you scale down your idea a bit and start with something less ambitious .. get some more experience of writing apps, and then come back to this when you're better able to design the system properly. Seriously, I've been writing web software for 10 years now, and I don't think I'd write a very good crawler.
Reply
Posted: Mon Jun 05, 2006 3:04 am
by Joeiscoolone
Heres the thing what i did not like about that guys post is that it was rude. You can explian what he explaind politly. I am learning were to start in making a web crawler that is why I asked, I think you guys may have made my post bigger than it really was. I was asking basic foundaments for any ranking system and syntax.Syntax was the made thing I posted for I should have put that in my post. But I apreciate your attempt and I am greatful when people answer my posts but I don't like people who are rude(the other guy).
Posted: Mon Jun 05, 2006 6:35 am
by MrPotatoes
no. you are usig no punctuation and the wording is absolutely terrible. online translators are easier to read. as a matter of fact Everah's post cleaning up your post was a much easier and better read
Posted: Mon Jun 05, 2006 6:43 am
by Weirdan
Just a friendly notice: no flame, please.
If you feel offended, contact a moderator (
privately).
If you think someone behaves wrong, contact a moderator (
privately).
On these forums moderators are your best friends

.
Reply
Posted: Mon Jun 05, 2006 7:05 am
by Joeiscoolone
Ok this was not a flame it was asking someone not to be inpolite and we have another rude person you guys really dont get it, he was rud it does not matter if I didn't puntuacte right it was still rud and you people are not very smart. If you want to kick me off fine,you guys are rud and stupid anyways if you are so impiote that you cant see why his post was wrong then your just plain stupid.
Posted: Mon Jun 05, 2006 7:41 am
by twigletmac
Friendliness and politeness cut both ways and taking time to present your posts clearly and in a somewhat structured manner makes a difference to how people respond to you.
Mac
Re: First off
Posted: Mon Jun 05, 2006 8:40 am
by RobertGonzalez
Joeiscoolone wrote:First off I dont want the community to develope the standard for me secondly you are very rude if your going to be rude do not answer my qeustion. What kind of code you would use to rank not an entirly built search engine, read posts carfully befor you post and be polite when suggesting to someone that there string is to long. The proper way of doing this would have been, hey just a suggetion but when you post try to not make your string so long break it up a bit. Not the rude way you did it.
If I offended you I apologize. It was not my intent to upset you or appear to be rude to you. I did intend to clean up your question because you combined three questions into one sentence with no punctuation. For that, I will not apologize. Everyone that posts here trying to help others appreciates it when the people seeking help present their need in a fashion that does not make us work twice as hard just to figure out what you want. It seems to be a professional courtesy... we take the time to try to help you by answering your questions. The least you (or anyone that posts here) could do is take a little time to format your question in a way that allows to understand it. But again, if I upset you, embarrassed you or offended you, I apologize. That was not my intent.
As for not answering you questions, I disagree with you. I answered each of your three questions to the best of my ability. Just reread the second post in this thread. If my answers are not suitable to you, that's fine. I am sure with a properly formed question you will get many other responses to your questions from a lot of other people in this community. But before you say I didn't answer question(s), read my answers. As a suggestion, you could say 'Thanks for the help Everah, but that is really not what I was looking for'.