Page 1 of 1

Unsure if PHP has the ability to do this.

Posted: Mon Oct 11, 2010 9:59 pm
by lavaeagle
I want to add websites to a queue and then scan them for meta data.
If(PHP can do this){ Point me to a tutorial or function I can read up on } else { Tell me what language can and I will learn it. }

I'm not sure if this is more of a java thing or Basic, please let me know!

Re: Unsure if PHP has the ability to do this.

Posted: Mon Oct 11, 2010 10:24 pm
by Jonah Bron
Yes, you'll need to do what's called "scraping". The best way in your case would be to search the file using Regular Expressions.

Code: Select all

$file = file_get_contents('url from database');
preg_match_all(
    '/<meta ((name="?(.*?)"?)|(http-equiv="?(.*?)"?)) content="?(.*?)"?/is',
    $file,
    $matches
);
print_r($matches);
In the print_r, you'll see how the results are set up, and how to read it.

I really have no idea if that regex works or not. If it doesn't, try posting for help in the Regex forum.

Re: Unsure if PHP has the ability to do this.

Posted: Tue Oct 12, 2010 12:53 pm
by lavaeagle
Thank you it works great but how can I get the data to print?

Code: Select all

$file = file_get_contents('http://www.worldmotivation.com');
preg_match_all(
    '/<meta ((name="?(.*?)"?)|(http-equiv="?(.*?)"?)) content="?(.*?)"?/is',
    $file,
    $matches
);
print_r($matches);
foreach($matches as $key=>$value){
	echo "Key: $key; Value: $value";
}
I am somewhat new to the idea of foreach statements.

Re: Unsure if PHP has the ability to do this.

Posted: Tue Oct 12, 2010 2:12 pm
by Jonah Bron
Can you post the output of what I gave you here?

Re: Unsure if PHP has the ability to do this.

Posted: Tue Oct 12, 2010 2:22 pm
by lavaeagle
Array ( [0] => Array ( [0] => name="author" [2] => name="description" [3] => name="keywords" [4] => name="robots" [5] => name="revisit-after" ) [2] => Array ( [0] => [1] => name="author" [2] => name="description" [3] => name="keywords" [4] => name="robots" [5] => name="revisit-after" ) [3] => Array ( [0] => [1] => author [2] => description [3] => keywords [4] => robots [5] => revisit-after ) [4] => Array ( [0] => http-equiv="Content-Type" [1] => [2] => [3] => [4] => [5] => ) [5] => Array ( [0] => Content-Type [1] => [2] => [3] => [4] => [5] => ) [6] => Array ( [0] => [1] => [2] => [3] => [4] => [5] => ) )

Re: Unsure if PHP has the ability to do this.

Posted: Tue Oct 12, 2010 2:35 pm
by Jonah Bron
Well, that output is a little wonky. I'm no regex pro; try posting in the Regex forum and say you want to extract <meta> tags from a page.

Re: Unsure if PHP has the ability to do this.

Posted: Tue Oct 12, 2010 5:36 pm
by requinix
I'd like to adjust the regex but I think giving a link would be more appropriate.

Re: Unsure if PHP has the ability to do this.

Posted: Tue Oct 12, 2010 5:39 pm
by Jonah Bron
*smacks face*

I have got to spend more time browsing the built-functions. :roll:

Re: Unsure if PHP has the ability to do this.

Posted: Wed Oct 13, 2010 1:37 pm
by lavaeagle
Well after you told me what this method was called I could google it correctly.
I think I'm going to have to take another approach to this though. Maybe java or basic.

Re: Unsure if PHP has the ability to do this.

Posted: Wed Oct 13, 2010 1:41 pm
by John Cartwright
lavaeagle wrote:Well after you told me what this method was called I could google it correctly.
I think I'm going to have to take another approach to this though. Maybe java or basic.
PHP would certainly be my first choice, unless of course you wanted to thread the queues to speed up performance. Then of course Java/C# would probably be better choices.