Unsure if PHP has the ability to do this.

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
lavaeagle
Forum Newbie
Posts: 21
Joined: Thu Sep 09, 2010 1:41 pm

Unsure if PHP has the ability to do this.

Post by lavaeagle »

I want to add websites to a queue and then scan them for meta data.
If(PHP can do this){ Point me to a tutorial or function I can read up on } else { Tell me what language can and I will learn it. }

I'm not sure if this is more of a java thing or Basic, please let me know!
User avatar
Jonah Bron
DevNet Master
Posts: 2764
Joined: Thu Mar 15, 2007 6:28 pm
Location: Redding, California

Re: Unsure if PHP has the ability to do this.

Post by Jonah Bron »

Yes, you'll need to do what's called "scraping". The best way in your case would be to search the file using Regular Expressions.

Code: Select all

$file = file_get_contents('url from database');
preg_match_all(
    '/<meta ((name="?(.*?)"?)|(http-equiv="?(.*?)"?)) content="?(.*?)"?/is',
    $file,
    $matches
);
print_r($matches);
In the print_r, you'll see how the results are set up, and how to read it.

I really have no idea if that regex works or not. If it doesn't, try posting for help in the Regex forum.
lavaeagle
Forum Newbie
Posts: 21
Joined: Thu Sep 09, 2010 1:41 pm

Re: Unsure if PHP has the ability to do this.

Post by lavaeagle »

Thank you it works great but how can I get the data to print?

Code: Select all

$file = file_get_contents('http://www.worldmotivation.com');
preg_match_all(
    '/<meta ((name="?(.*?)"?)|(http-equiv="?(.*?)"?)) content="?(.*?)"?/is',
    $file,
    $matches
);
print_r($matches);
foreach($matches as $key=>$value){
	echo "Key: $key; Value: $value";
}
I am somewhat new to the idea of foreach statements.
User avatar
Jonah Bron
DevNet Master
Posts: 2764
Joined: Thu Mar 15, 2007 6:28 pm
Location: Redding, California

Re: Unsure if PHP has the ability to do this.

Post by Jonah Bron »

Can you post the output of what I gave you here?
lavaeagle
Forum Newbie
Posts: 21
Joined: Thu Sep 09, 2010 1:41 pm

Re: Unsure if PHP has the ability to do this.

Post by lavaeagle »

Array ( [0] => Array ( [0] => name="author" [2] => name="description" [3] => name="keywords" [4] => name="robots" [5] => name="revisit-after" ) [2] => Array ( [0] => [1] => name="author" [2] => name="description" [3] => name="keywords" [4] => name="robots" [5] => name="revisit-after" ) [3] => Array ( [0] => [1] => author [2] => description [3] => keywords [4] => robots [5] => revisit-after ) [4] => Array ( [0] => http-equiv="Content-Type" [1] => [2] => [3] => [4] => [5] => ) [5] => Array ( [0] => Content-Type [1] => [2] => [3] => [4] => [5] => ) [6] => Array ( [0] => [1] => [2] => [3] => [4] => [5] => ) )
User avatar
Jonah Bron
DevNet Master
Posts: 2764
Joined: Thu Mar 15, 2007 6:28 pm
Location: Redding, California

Re: Unsure if PHP has the ability to do this.

Post by Jonah Bron »

Well, that output is a little wonky. I'm no regex pro; try posting in the Regex forum and say you want to extract <meta> tags from a page.
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Unsure if PHP has the ability to do this.

Post by requinix »

I'd like to adjust the regex but I think giving a link would be more appropriate.
User avatar
Jonah Bron
DevNet Master
Posts: 2764
Joined: Thu Mar 15, 2007 6:28 pm
Location: Redding, California

Re: Unsure if PHP has the ability to do this.

Post by Jonah Bron »

*smacks face*

I have got to spend more time browsing the built-functions. :roll:
lavaeagle
Forum Newbie
Posts: 21
Joined: Thu Sep 09, 2010 1:41 pm

Re: Unsure if PHP has the ability to do this.

Post by lavaeagle »

Well after you told me what this method was called I could google it correctly.
I think I'm going to have to take another approach to this though. Maybe java or basic.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: Unsure if PHP has the ability to do this.

Post by John Cartwright »

lavaeagle wrote:Well after you told me what this method was called I could google it correctly.
I think I'm going to have to take another approach to this though. Maybe java or basic.
PHP would certainly be my first choice, unless of course you wanted to thread the queues to speed up performance. Then of course Java/C# would probably be better choices.
Post Reply