Page 1 of 1

Get meta description from local html file...

Posted: Sun Aug 01, 2004 11:47 am
by tomfra
Let's say there is a page called something.html and in that page is a meta description tag in either of these two formats:

Code: Select all

<meta content="Something blah blah blah..." name="description">
<meta name="description" content="Something blah blah blah...">
How can I get the content of the meta content tag into a string? I guess I should use some REGEX but I have almost no experience with that yet.

Thanks!

Tomas

Posted: Sun Aug 01, 2004 11:59 am
by feyd
untested

Code: Select all

<?php

preg_match_all('#<\s*meta\s+(name\s*=\s*([''"]?)description\\2[^>]*?content\s*=\s*([''"]?)([^\\3]*?)\\3|content\s*=\s*([''"]?)([^\\4]*?)\\4[^>]*?name\s*=\s*([''"]?)description\\5)[^>]*?>#is',$html,$matches);

?>

Posted: Sun Aug 01, 2004 12:40 pm
by tomfra
It returns only 8 empty arrays. If I am using it correctly that is...

Here is the complete code:

Code: Select all

$html = file_get_contents($pagename);

preg_match_all('#<\s*meta\s+(name\s*=\s*([''"]?)description\\2[^>]*?content\s*=\s*([''"]?)([^\\3]*?)\\3|content\s*=\s*([''"]?)([^\\4]*?)\\4[^>]*?name\s*=\s*([''"]?)description\\5)[^>]*?>#is',$html,$matches); 

print_r ($matches);
$pagename is defined of course.

Tomas

Posted: Sun Aug 01, 2004 12:54 pm
by feyd
it may be that the content of the html is slightly outside the variance of the regex...

Posted: Sun Aug 01, 2004 1:35 pm
by tomfra
When I create a string like this:

Code: Select all

$meta = '<meta content="some words" name="description">';
...and use the preg_match_all on it, I get the same results.

Tomas

Posted: Sun Aug 01, 2004 2:09 pm
by feyd
oops, my numbers where off

Code: Select all

<?php

$html = '<meta content="some words" name="description">'; 

preg_match_all('#<\s*meta\s+(name\s*=\s*([''"]?)description\\2[^>]*?content\s*=\s*([''"]?)([^\\3]*?)\\3|content\s*=\s*([''"]?)([^\\5]*?)\\5[^>]*?name\s*=\s*([''"]?)description\\7)[^>]*?>#is',$html,$matches);

echo '<pre>'.print_r($matches,true).'</pre>';

?>

Posted: Sun Aug 01, 2004 3:26 pm
by tomfra
Almost there :)

The most important problem right now is that when meta keywords tag is present in the file together with the meta description tag, it gets confused and does not return correct result.

The other complication is that although both formats - i.e. <meta content="some words" name="description"> & '<meta name="description" content="some words">' work, each of them returns the correct results in different array which makes working with it somewhat complicated. Unless there is a better way how to work with these results.

Tomas

Posted: Sun Aug 01, 2004 3:37 pm
by feyd

Code: Select all

$matches = array_merge($matches[4],$matches[6]);
// or whatever the correct array sets are in...

Posted: Sun Aug 01, 2004 4:43 pm
by tomfra
This created 2 arrays for some reason, such as:

Code: Select all

Array
(
    [0] => Some words, blah blah blah...
    [1] => 
)
And reversed for the other case. I know too little about arrays still and the help at php.net is currently limited for some technical difficulties.

Tomas

Posted: Sun Aug 01, 2004 4:48 pm
by tomfra
I've simply imploded the array. Probably not the most elegant way but seems to be working. Now just the meta keywords & meta description part and everything will be perfect :)

Tomas

Posted: Sun Aug 01, 2004 4:48 pm
by feyd
that's 1 array. with 2 elements, the second empty..

you'd process them with a [php_man]foreach[/php_man], [php_man]while[/php_man], or [php_man]for[/php_man] loop.

or you could just check to see which one is empty..

Posted: Mon Aug 02, 2004 2:27 pm
by tomfra
Anyone knows how to solve the problem when the <meta name="keywords"> and <meta name="content"> tags are both present in the html file? The second meta tag which is usually the content one does not work well the the preg_match_all REGEX above because it also tries to read the meta keywords tag which it should not - and it results in output like:

"some keywords" name="keywords">

...instead of the meta description content.

Any ideas are welcome. I promise I will not ask any more dumb questions for a while then :)

Thanks!

Tomas