Page 1 of 1

Trim text (post summary) without cutting hyperlink in half?

Posted: Wed Nov 18, 2009 7:21 pm
by gearu
I have set up a website for an author who posts news in the blog part of the website. The website is set up using wordpress.
I have create a custom piece of code that grabs the most recent post, and displays it on the home page in a fixed size box.
If the most recent post is really small, it will grab the second post as well.

Only a summary of the post is displayed.
The text is truncated after a specified number of characters.
images are stripped out, as I only want the text to display. I want to maintain hyperlinks so that they still work.
The problem I am having is the following 2 things:

-- Becuase i am counting the characters, hidden text - primarily html elements (ie a hyperlink) are counted, even though these are not visible text when displayed = results in a shorter than anticipate summary
-- Sometimes the substring cuts off in the middle of of a hyperlink, thus breaking the hyperlink and causing other problems (in one case the read more link that is appended afterwards started hyperlinking to the wrong place)

any help or suggestions would be greatly appreciated. I have been looking at an investigating the evermore plugin. whilst this does not fix my problem, i thought i might be able to borrow some of the ideas.

ideally i need to cut the text and then just check that i am not in the middle of a hyperlink, or other html tag. I would also like to only count the visible text when choosing the truncation point, so that i can get a more consistent length.

I appologise that this is not the most elegant piece of code!

Code: Select all

 
<?php
        $numberofposts = 1; //this value gets changed by the code below..
        $summary_length = 350; //this value can get changed by the code below...
        $post_is_too_small_length = 200;
        $short_summary_length = 210;        
 
        //get first 2 posts, and check their length. use this to decide how many posts to display on home page.
        $posts = get_posts("numberposts=".$numberofposts);
        foreach ($posts as $post)
        {
            $full_content = apply_filters('the_content', $post->post_content);
            $content_less_img = strip_tags($full_content, '<p><a><h2><blockquote><code><ul><li><i><em><strong>');  //content with images stripped. only tags listed will not be removed.                                                
 
            //if the post is really short, then we want to go and grab the next post but we also want to make the length a bit shorter, in case the next post is really long
            $post_length = strlen($content_less_img);
 
            if ($post_length > $post_is_too_small_length)
            {
              $numberofposts = 1;
              break; //first post is long enough, so no need to check any further
            }
            elseif($post_length <= $post_is_too_small_length)
            {
              $numberofposts = 2;
              $summary_length = $short_summary_length;
            }
            else
            {
              $numberofposts = 1;
            }
        }
 
        //now get the content for display
 
        $full_content='';
 
        echo '<h3>Latest News</h3><br />';
 
        $posts = get_posts("numberposts=".$numberofposts);
        foreach ($posts as $post)
        {
        echo '<strong><a href="'.get_permalink($post->ID).'">'.$post->post_title.'</a></strong><br />';
        echo the_time('<\i>F jS, Y</\i>').'<br />';   
 
        $full_content = apply_filters('the_content', $post->post_content);
        $content_less_img = strip_tags($full_content, '<p><a><h2><blockquote><code><ul><li><i><em><strong><br /><span>');    //content with images stripped. only tags listed will not be removed.
 
        $short_content = substr($content_less_img,0,$summary_length);
        echo '<span>'.$short_content.'</span>';     
 
        echo '<a href="'.get_permalink($post->ID).'">Read More...</a><br /><br />';
      }
      ?>
 

Re: Trim text (post summary) without cutting hyperlink in half?

Posted: Thu Nov 19, 2009 9:14 am
by Jonah Bron
Hm, that's a tough one.

Okay, how about this:

Use preg_match to get the number of opening anchor and closing anchor tags there are in the substring you want.

Use while, and keep doing that, until the numbers match, adding one to the substr length each iteration.

Also, to keep from cutting in the middle of a tag (e.g. <a href="http://exam"), do the same thing with the substring you cut off (AND them in the same while block).

Re: Trim text (post summary) without cutting hyperlink in half?

Posted: Thu Nov 19, 2009 3:01 pm
by gearu
right, I managed to come up with a solution. not exactly what i wanted - but it works. I created this by using some of the code in the evermore wordpress plugin (which works great for truncating posts automatically, but needed some customisation to work on my home page).

this solution grabs the first 2 paragraphs, with a minimum of 200 characters (it will grab a 3rd paragrah if the first 2 are too short):

This is the code on my home page:

Code: Select all

 
      <?php      
      if (function_exists('create_short_content')) 
      {           
           $numberofposts = 1;
           $paras_to_skip = 2;
           $min_chars_to_skip = 200;
           
           echo '<h3>Latest News</h3><br/>';
                      
           $posts = get_posts("numberposts=".$numberofposts);
           foreach ($posts as $post)
           {          
              echo '<strong><a href="'.get_permalink($post->ID).'">'.$post->post_title.'</a></strong><br/>';                      
              echo the_time('<\i>F jS, Y</\i>').'<br/>'; 
           
              $full_content = apply_filters('the_content', $post->post_content);                      
              $content_less_img = strip_tags($full_content, '<p><a><h2><blockquote><code><ul><li><i><em><strong>');  //content with images stripped. only tags listed will not be removed.                                                
            
              $summary_text = create_short_content($content_less_img,  $min_chars_to_skip, $paras_to_skip);
              
              echo '<span>'.$summary_text.'</span>';     
              echo '<p><a href="'.get_permalink($post->ID).'">Read More...</a><p><br/><br/>';   
           }           
        }                                  
      ?>
and this is the function that i created that it is calling:

Code: Select all

 
function create_short_content($post_content, $min_chars_to_skip, $paras_to_skip) 
{
        $char_skip_count = intval($min_chars_to_skip);
        $para_skip_count = intval($paras_to_skip);  
 
        // Skip a number of initial characters
        $skipped_chars = substr($post_content, 0, $char_skip_count);
        $unskipped_chars = substr($post_content, $char_skip_count);
 
        // Use regex-fu to break the post into paragraphs. This scheme
        // may fail on pathological combinations of <br> and
        // newline chars. It can also fail on nested block-level tags
        // (e.g. nested divs). So don't do that!
 
        // Pattern matching an HTML tag that indicates a paragraph
        $para_tag = '(?:p|pre|blockquote|div|ol|ul|h[1-6]|table)';
        // Pattern matching two consecutive newlines with optional space between
        $double_newline = "(?:\r\n *\r\n|\r *\r|\n *\n|<br\s*/?>\s*<br\s*/?>)";
        // Pattern matching optional whitespace
        $ws = '\s*';
        // Pattern matching paragraph body (must start at the beginning
        // of a paragraph, and be followed by a paragraph end)
        $body = '.+?';
        // Pattern matching the end of a paragraph
        $end = "(?:$double_newline|</$para_tag>|(?<=\W)(?=$ws<$para_tag\W))";
 
        // Get all the skipped paragraphs, but separate the end of the final paragraph so
    // we can add a "run-on=""
    //regex finds: a para body; followed by (n-1) end+body pairs; followed by an end; followed by something.
    $para_skip_dec = $para_skip_count - 1;
    
    if (preg_match("!^($ws$body(?:$end$ws$body){".$para_skip_dec."})($end)$ws\S!is", $unskipped_chars, $matches))
      {
        $skipped_paras = $matches[1];
        $skipped_end = $matches[2];
        $unskipped_paras = substr($unskipped_chars, strlen($skipped_paras) + strlen($skipped_end));
 
        return $skipped_chars . $skipped_paras;   
      }
  
    return $post_content;
  }
 

Re: Trim text (post summary) without cutting hyperlink in half?

Posted: Thu Nov 19, 2009 3:04 pm
by gearu
Jonah Bron wrote:Hm, that's a tough one.

Okay, how about this:

Use preg_match to get the number of opening anchor and closing anchor tags there are in the substring you want.

Use while, and keep doing that, until the numbers match, adding one to the substr length each iteration.

Also, to keep from cutting in the middle of a tag (e.g. <a href="http://exam"), do the same thing with the substring you cut off (AND them in the same while block).
Thanks very much for the input, as per above i managed to get something to work (and this does use preg_match). Unfortunatley i don't fully understand exactly how preg_match is working. I still have a bit to learn about the complexities of regular expressions.
Your suggestions sounds like the correct approach, but due to the fact that this website is for a client i decided i could not take too long to fix it!

feel free to take a look at the site http://www.rachael-king.com. Her new book is out now and just hit #2 in the New Zealand best sellers list :)