Get meta description from local html file...

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
tomfra
Forum Contributor
Posts: 126
Joined: Wed Jun 23, 2004 12:56 pm
Location: Prague, Czech Republic

Get meta description from local html file...

Post by tomfra »

Let's say there is a page called something.html and in that page is a meta description tag in either of these two formats:

Code: Select all

<meta content="Something blah blah blah..." name="description">
<meta name="description" content="Something blah blah blah...">
How can I get the content of the meta content tag into a string? I guess I should use some REGEX but I have almost no experience with that yet.

Thanks!

Tomas
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

untested

Code: Select all

<?php

preg_match_all('#<\s*meta\s+(name\s*=\s*([''"]?)description\\2[^>]*?content\s*=\s*([''"]?)([^\\3]*?)\\3|content\s*=\s*([''"]?)([^\\4]*?)\\4[^>]*?name\s*=\s*([''"]?)description\\5)[^>]*?>#is',$html,$matches);

?>
tomfra
Forum Contributor
Posts: 126
Joined: Wed Jun 23, 2004 12:56 pm
Location: Prague, Czech Republic

Post by tomfra »

It returns only 8 empty arrays. If I am using it correctly that is...

Here is the complete code:

Code: Select all

$html = file_get_contents($pagename);

preg_match_all('#<\s*meta\s+(name\s*=\s*([''"]?)description\\2[^>]*?content\s*=\s*([''"]?)([^\\3]*?)\\3|content\s*=\s*([''"]?)([^\\4]*?)\\4[^>]*?name\s*=\s*([''"]?)description\\5)[^>]*?>#is',$html,$matches); 

print_r ($matches);
$pagename is defined of course.

Tomas
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

it may be that the content of the html is slightly outside the variance of the regex...
tomfra
Forum Contributor
Posts: 126
Joined: Wed Jun 23, 2004 12:56 pm
Location: Prague, Czech Republic

Post by tomfra »

When I create a string like this:

Code: Select all

$meta = '<meta content="some words" name="description">';
...and use the preg_match_all on it, I get the same results.

Tomas
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

oops, my numbers where off

Code: Select all

<?php

$html = '<meta content="some words" name="description">'; 

preg_match_all('#<\s*meta\s+(name\s*=\s*([''"]?)description\\2[^>]*?content\s*=\s*([''"]?)([^\\3]*?)\\3|content\s*=\s*([''"]?)([^\\5]*?)\\5[^>]*?name\s*=\s*([''"]?)description\\7)[^>]*?>#is',$html,$matches);

echo '<pre>'.print_r($matches,true).'</pre>';

?>
tomfra
Forum Contributor
Posts: 126
Joined: Wed Jun 23, 2004 12:56 pm
Location: Prague, Czech Republic

Post by tomfra »

Almost there :)

The most important problem right now is that when meta keywords tag is present in the file together with the meta description tag, it gets confused and does not return correct result.

The other complication is that although both formats - i.e. <meta content="some words" name="description"> & '<meta name="description" content="some words">' work, each of them returns the correct results in different array which makes working with it somewhat complicated. Unless there is a better way how to work with these results.

Tomas
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Code: Select all

$matches = array_merge($matches[4],$matches[6]);
// or whatever the correct array sets are in...
tomfra
Forum Contributor
Posts: 126
Joined: Wed Jun 23, 2004 12:56 pm
Location: Prague, Czech Republic

Post by tomfra »

This created 2 arrays for some reason, such as:

Code: Select all

Array
(
    [0] => Some words, blah blah blah...
    [1] => 
)
And reversed for the other case. I know too little about arrays still and the help at php.net is currently limited for some technical difficulties.

Tomas
tomfra
Forum Contributor
Posts: 126
Joined: Wed Jun 23, 2004 12:56 pm
Location: Prague, Czech Republic

Post by tomfra »

I've simply imploded the array. Probably not the most elegant way but seems to be working. Now just the meta keywords & meta description part and everything will be perfect :)

Tomas
Last edited by tomfra on Sun Aug 01, 2004 4:48 pm, edited 1 time in total.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

that's 1 array. with 2 elements, the second empty..

you'd process them with a [php_man]foreach[/php_man], [php_man]while[/php_man], or [php_man]for[/php_man] loop.

or you could just check to see which one is empty..
tomfra
Forum Contributor
Posts: 126
Joined: Wed Jun 23, 2004 12:56 pm
Location: Prague, Czech Republic

Post by tomfra »

Anyone knows how to solve the problem when the <meta name="keywords"> and <meta name="content"> tags are both present in the html file? The second meta tag which is usually the content one does not work well the the preg_match_all REGEX above because it also tries to read the meta keywords tag which it should not - and it results in output like:

"some keywords" name="keywords">

...instead of the meta description content.

Any ideas are welcome. I promise I will not ask any more dumb questions for a while then :)

Thanks!

Tomas
Post Reply