Page 1 of 1

highlight_string not returning properly

Posted: Tue Dec 08, 2009 5:04 am
by Weiry
Ive created a class which formats strings passed to it and highlights any code to it.
Recently i fixed a problem where there would be multiple printout's if at "$1" was found in the string, however when i try to implement the highlight_string() function to highlight any code, it returns something odd and im not exactly sure why.

The code in question is around lines 82-109 where the formatting is taking place.

Any help would be appreciated.
The Class:

Code: Select all

class VNCode{
 
    #
    #   Encode Function
    #
    #   This is the only public function and
    #   is used to encode the desired string.
    #
    public function encode($string){
        // If the string is not empty
        if(!empty($string)){
        // Create the regex array for tags
            $search = array( 
                '/\[b\](.*?)\[\/b\]/is', 
                '/\[i\](.*?)\[\/i\]/is', 
                '/\[u\](.*?)\[\/u\]/is', 
                '/\[quote\](.*?)\[\/quote\]/is', 
                '/\[quote\=(.*?)\](.*?)\[\/quote\]/is',
                '/\[img\](.*?)\[\/img\]/is', 
                '/\[url\](.*?)\[\/url\]/is',
                '/\[url\=(.*?)\](.*?)\[\/url\]/is',
                '/\n/s',
                '/\t/s'); 
        // Create the replacement HTML array tags
            $replace = array( 
                '<strong>$1</strong>', 
                '<em>$1</em>', 
                '<u>$1</u>', 
                '<div class="quotecontent">$1</div>',
                '<div class="quotetitle">$1 wrote:</div><div class="quotecontent">$2</div>',
                '<img src="$1" />', 
                '<a href="$1">$1</a>', 
                '<a href="$1">$2</a>',
                '<br>',
                '&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;'); 
        // Convert any html characters to encoded characters
            $string = htmlspecialchars($string);
        // Create the code language array
            $languages = $this->getLanguages();
        // Create the code boxes
            foreach($languages as $lang){
                // Split each tag
                $codeArr = preg_split($lang['tag'], $string, -1, PREG_SPLIT_OFFSET_CAPTURE);
                // Remove anything other than the tags
                array_splice($codeArr, 0, 1);
                
                foreach($codeArr as $code){
                    // Replace the existing text with the new code box
                    $string = preg_replace($lang['regex'], $this->formatCode($code[0],$lang['lang']), $string);
                }
            }
        // Return the final encoded string
            return preg_replace($search, $replace, $string);
        }else{
            // Return false if no string was sent
            return false;
        }
    }
 
    #
    #   FormatCode Function
    #
    #   This function is responsible for replacing
    #   any specified code tags into code boxes.
    #   All tags are pre-specified.
    #
    private function formatCode($string,$type){
        if(!empty($string)){
            // Check each language available and set the appropreate language
            foreach($this->getLanguages() as $language){
                if(in_array($type, $language)){
                    $code = $language['lang'];
                }
            }
            // Split the code into lines
            $codeArr = preg_split('/\\n/s', $string);
            
            // Create the code box header
            $newCodeArr[] = "<div class='codebox'><div class='codeheader'>{$code}</div><div class='codeholder' style='max-height: 300px;'>";
            
            
            for($i = 0,$j = 1; $i <= count($codeArr); $i++,$j++){
                // Replace any $ symbols with their HTML equivilient ( DO NOT REMOVE THIS LINE )
                $codeArr[$i] = preg_replace('/\$/','&#36;',$codeArr[$i]);
                if($i % 2){
                    // For each line of code, check for a close tag
                    if(preg_match('/\[\/(.*?)\]/i', $codeArr[$i], $matches, PREG_OFFSET_CAPTURE)){
                        $newCodeArr[] = "<div class='li2'>$j. ".preg_replace('/\[\/(.*?)\]/is', '', $codeArr[$i])."</div>";
                        break;
                    }
                    // If no close tag exists, store the new line of code.
                    $newCodeArr[] = "<div class='li2'>$j. {$codeArr[$i]}</div>";
                }else{
                    // For each line of code, check for a close tag
                    if(preg_match('/\[\/(.*?)\]/i', $codeArr[$i], $matches, PREG_OFFSET_CAPTURE)){
                        $newCodeArr[] = "<div class='li1'>$j. ".preg_replace('/\[\/(.*?)\]/is', '', $codeArr[$i])."</div>";
                        break;
                    }
                    // If no close tag exists, store the new line of code.
                    $newCodeArr[] .= "<div class='li1'>$j. {$codeArr[$i]}</div>";
                }
                print html_entity_decode(highlight_string(htmlentities($codeArr[$i]), true));
                
            }
            // Close the code box
            $newCodeArr[] = "</div></div>";
            
            // Return the HTML code for the code box
            return implode($newCodeArr);
        }else{
            // Return false if no code was submitted
            return false;
        }
    }
    
    #
    #   getLanguages Function
    #
    #   This function defines languages and
    #   regular expressions used in order to
    #   determine the current tag being processed.
    #
    private function getLanguages(){
        return array(
                "php"       => array( "lang" => "php",  "tag" => '/\[php\]/sx',         "regex" => '/\[php\](.*?)\[\/php\]/is'),
                "html"      => array( "lang" => "html", "tag" => '/\[html\]/sx',        "regex" => '/\[html\](.*?)\[\/html\]/is'),
                "sql"       => array( "lang" => "sql",  "tag" => '/\[sql\]/sx',         "regex" => '/\[sql\](.*?)\[\/sql\]/is'),
                "code"      => array( "lang" => "code", "tag" => '/\[code\]/sx',        "regex" => '/\[code\](.*?)\[\/code\]/is'),
                "code=php"  => array( "lang" => "php",  "tag" => '/\[code\=php\]/sx',   "regex" => '/\[code\=php\](.*?)\[\/code\]/is'),
                "code=sql"  => array( "lang" => "sql",  "tag" => '/\[code\=sql\]/sx',   "regex" => '/\[code\=sql\](.*?)\[\/code\]/is'),
                "code=html" => array( "lang" => "html", "tag" => '/\[code\=html\]/sx',  "regex" => '/\[code\=html\](.*?)\[\/code\]/is')
            );
    }
};
When this line is executed:
print highlight_string($codeArr[$i], true);

I get the following output:

Code: Select all

function encode(&#36;string){      &#36;search = array( '/\[code\](.*?)\[\/code\]/is', '/\[php\](.*?)\[\/php\]/is' );      &#36;replace = array( "<div class='codebox'><div class='codeheader'>Code: </div><div class='codeholder'>&#36;1</div></div>", &#36;this->colourCode('&#36;1') );      return preg_replace(&#36;search, &#36;replace, &#36;string); } function colourCode(&#36;string){     &#36;string = highlight_string(&#36;string);     return "<div class='codebox'><div class='codeheader'>Code: </div><div class='codeholder'>{&#36;string}</div></div>";

Re: highlight_string not returning properly

Posted: Tue Dec 08, 2009 5:14 am
by jackpf
Hmm...I can't really test your code since I'm at college atm, but it looks like you're encoding it twice.

So yeah...I'd try displaying the code at various points through the process to see where it's encoded the second time.

I see you're using htmlspecialchars() on line 37, and htmlentities() on line 102. But yeah, can't test atm.

You could also try setting htmlentities/htmlspecialchars()'s 4th argument to false.

EDIT
Wait, wtf 8O
It's displaying your code? Weird..
I'll have a look when I get home. :D

EDIT #2
I just tried it on my server and it seems to work ok. Doesn't highlight anything...but it doesn't return anything odd.

Re: highlight_string not returning properly

Posted: Tue Dec 08, 2009 6:26 am
by Weiry
Heh... yeh sorry i forgot to mention that i was actually testing an older version of the code into the actual class itself :roll: :oops:

But you see my problem, the code works, but it wont highlight anything and im not sure how i can fix it :S I've been working on it for about 3 days now so im not seeing things as clearly as i probably should be.

Basically this is the input string i am using to test the code for highlighting and other problems.

Code: Select all

[ php]function encode($string){
     $search = array( '/\[code\](.*?)\[\/code\]/is', '/\[php\](.*?)\[\/php\]/is' );
     $replace = array( "<div class='codebox'><div class='codeheader'>Code: </div><div class='codeholder'>$1</div></div>", $this->colourCode('$1') );
     return preg_replace($search, $replace, $string);
}
function colourCode($string){
    $string = highlight_string($string);
    return "<div class='codebox'><div class='codeheader'>Code: </div><div class='codeholder'>{$string}</div></div>";
}[ /php]
jackpf wrote:I'd try displaying the code at various points through the process to see where it's encoded the second time.
Is this a reference to the multiple printout's i was referring to? if so i did mention that i fixed it :) I had to replace the $ symbol with the HTML equivalent before and preg_replace took place in the format function.

Re: highlight_string not returning properly

Posted: Tue Dec 08, 2009 9:46 am
by jackpf
I have a function similar to this for my forum.

Although...it's a lot more...compact.

Code: Select all

class parser
{
    public function parse($code)
    {
        $code = preg_replace_callback('/\[code(\=(.*?))?\](.*?)(\[\/code\]|$)/isS', array($this, 'parse_code'), $code);
        
        //...
        
        return $code;
    }
    private function parse_code($code)
    {
        list($code, $code_type) = array($code[3], $code[2]);
        
        if($code_type == 'php')
        {
            //escape backslashes
            $code = str_replace('\\', '\\\\', trim($code));
            
            //highlight!
            $code = highlight_string($code, true);
            
            //remove line breaks highlight_string() appends
            $code = str_replace(array('<br />', "\n"), null, $code);
        }
        else
        {
            //i have my own syntax highlighter i made to parse general code..
        }
        
        return $code;
    }
}
Something like that. :D
I use preg_replace_callback(), which makes things a lot easier. You could also use the e modifier to do the same thing...

But I dunno. Your code seems a bit...over complicated.

Re: highlight_string not returning properly

Posted: Wed Dec 09, 2009 9:26 am
by Weiry
jackpf wrote:But I dunno. Your code seems a bit...over complicated.
That could be due to the sheer number of different tags i'm allowing for. And the fact that the formatCode() function formats the code into a HTML string. Also im doing line numbering for each line of code.

But anyways, i somewhat fixed the issue with no highlighting. It turns out i needed to highlight the code before i split it into an array for each line.
Here is the modification in the formatCode function:

Code: Select all

// Highlight the string
$string = highlight_string(html_entity_decode($string),true);
// Split the code into lines
$codeArr = preg_split('/\/s', $string);
// Create the code box header
$newCodeArr[] = "<div class='codebox'><div class='codeheader'>{$code}</div><div class='codeholder' style='max-height: 300px;'>";
Now i still have one issue remaining with the highlight_string() function. Ive noticed that if a person does not supply the appropreate tags to the function, it will not highlight it, even if is a valid code type.

So if i were to input some php code to the function without the <?php tag, the function will not highlight it.
What would be the easiest way to bypass needing the <?php tag?

Re: highlight_string not returning properly

Posted: Wed Dec 09, 2009 11:50 am
by jackpf
Hmm...you could check if it's in there, if not, add it before you highlight it, then remove it afterwards.

That seems a bit long winded though.

But I have to be somewhere!! I'll have a look tomorrow.

:D

Re: highlight_string not returning properly

Posted: Tue Dec 15, 2009 6:11 am
by Weiry
Sorry it's been a while, my damned ISP had me without internet for 4 days :banghead:

I think i have found some sort of a solution to my problem although its going to require far more coding and a rework of my existing code. I took an idea from a few code highlighter's i have seen and decided that i shall try storing each tag in an array form along with formatting styles etc.
I guess its probably one of the only full proof ways of making sure the code will highlight accordingly.

Its going to be a long while before it can be completed this way though. I suppose that as a single form of highlighting and displaying code boxes correctly, it would be almost fine as is currently, however i'm looking for customization and easily able to plugin new languages.

Thanks for the help.