Unsure if PHP has the ability to do this.
Moderator: General Moderators
Unsure if PHP has the ability to do this.
I want to add websites to a queue and then scan them for meta data.
If(PHP can do this){ Point me to a tutorial or function I can read up on } else { Tell me what language can and I will learn it. }
I'm not sure if this is more of a java thing or Basic, please let me know!
If(PHP can do this){ Point me to a tutorial or function I can read up on } else { Tell me what language can and I will learn it. }
I'm not sure if this is more of a java thing or Basic, please let me know!
- Jonah Bron
- DevNet Master
- Posts: 2764
- Joined: Thu Mar 15, 2007 6:28 pm
- Location: Redding, California
Re: Unsure if PHP has the ability to do this.
Yes, you'll need to do what's called "scraping". The best way in your case would be to search the file using Regular Expressions.
In the print_r, you'll see how the results are set up, and how to read it.
I really have no idea if that regex works or not. If it doesn't, try posting for help in the Regex forum.
Code: Select all
$file = file_get_contents('url from database');
preg_match_all(
'/<meta ((name="?(.*?)"?)|(http-equiv="?(.*?)"?)) content="?(.*?)"?/is',
$file,
$matches
);
print_r($matches);
I really have no idea if that regex works or not. If it doesn't, try posting for help in the Regex forum.
Re: Unsure if PHP has the ability to do this.
Thank you it works great but how can I get the data to print?
I am somewhat new to the idea of foreach statements.
Code: Select all
$file = file_get_contents('http://www.worldmotivation.com');
preg_match_all(
'/<meta ((name="?(.*?)"?)|(http-equiv="?(.*?)"?)) content="?(.*?)"?/is',
$file,
$matches
);
print_r($matches);
foreach($matches as $key=>$value){
echo "Key: $key; Value: $value";
}
- Jonah Bron
- DevNet Master
- Posts: 2764
- Joined: Thu Mar 15, 2007 6:28 pm
- Location: Redding, California
Re: Unsure if PHP has the ability to do this.
Can you post the output of what I gave you here?
Re: Unsure if PHP has the ability to do this.
Array ( [0] => Array ( [0] => name="author" [2] => name="description" [3] => name="keywords" [4] => name="robots" [5] => name="revisit-after" ) [2] => Array ( [0] => [1] => name="author" [2] => name="description" [3] => name="keywords" [4] => name="robots" [5] => name="revisit-after" ) [3] => Array ( [0] => [1] => author [2] => description [3] => keywords [4] => robots [5] => revisit-after ) [4] => Array ( [0] => http-equiv="Content-Type" [1] => [2] => [3] => [4] => [5] => ) [5] => Array ( [0] => Content-Type [1] => [2] => [3] => [4] => [5] => ) [6] => Array ( [0] => [1] => [2] => [3] => [4] => [5] => ) )
- Jonah Bron
- DevNet Master
- Posts: 2764
- Joined: Thu Mar 15, 2007 6:28 pm
- Location: Redding, California
Re: Unsure if PHP has the ability to do this.
Well, that output is a little wonky. I'm no regex pro; try posting in the Regex forum and say you want to extract <meta> tags from a page.
Re: Unsure if PHP has the ability to do this.
I'd like to adjust the regex but I think giving a link would be more appropriate.
- Jonah Bron
- DevNet Master
- Posts: 2764
- Joined: Thu Mar 15, 2007 6:28 pm
- Location: Redding, California
Re: Unsure if PHP has the ability to do this.
*smacks face*
I have got to spend more time browsing the built-functions.
I have got to spend more time browsing the built-functions.
Re: Unsure if PHP has the ability to do this.
Well after you told me what this method was called I could google it correctly.
I think I'm going to have to take another approach to this though. Maybe java or basic.
I think I'm going to have to take another approach to this though. Maybe java or basic.
- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
Re: Unsure if PHP has the ability to do this.
PHP would certainly be my first choice, unless of course you wanted to thread the queues to speed up performance. Then of course Java/C# would probably be better choices.lavaeagle wrote:Well after you told me what this method was called I could google it correctly.
I think I'm going to have to take another approach to this though. Maybe java or basic.