Diff (using UNIX cmdline diff util)

Small, short code snippets that other people may find useful. Do you have a good regex that you would like to share? Share it! Even better, the code can be commented on, and improved.

Moderator: General Moderators

Post Reply
cravikiran
Forum Newbie
Posts: 7
Joined: Sat Sep 18, 2004 4:55 pm

Diff (using UNIX cmdline diff util)

Post by cravikiran »

Well, so when I was making my Wiki-type site (Sourceworld), I was in need of a diff function. Now, I suppose I could have gone to the trouble of coding an actual diff function but I decided to just use the UNIX command line 'diff' util. So, the following diff function formats the output of the 'diff' util. Maybe this will help some of you out there in a similar situation.

Code: Select all

//------------------------------------------------------------------------------
// Diff section
//------------------------------------------------------------------------------
function diff_section( &$diff, 
                       &$stat_removed, &$stat_added, 
                       &$removed, &$added ) {
  // if we came to context lines again and we have some added and removed 
  // lines, write those out
  if( $stat_removed != 0 || $stat_added != 0 ) {
    for( $i = 0; $i < max( $stat_removed, $stat_added ); $i++ ) {
      if( isset( $removed[$i] ) ) {
        $removed_class = "class="removed""; 
        $removed_tok = "-";
      }
      else {
        $removed_class = ""; 
        $removed_tok="";
      }
      if( isset( $added[$i] ) ) {
        $added_class = "class="added""; 
        $added_tok = "+";
      }
      else {
        $added_class = ""; 
        $added_tok = "";
      }
      $diff .= "<tr>" .
               "  <td>$removed_tok</td>" .
               "  <td $removed_class>" . 
                    htmlspecialchars( substr( $removed[$i], 1 ) ) . 
               "  </td>" .
               "  <td>$added_tok</td>" .
               "  <td $added_class>" .
                    htmlspecialchars( substr( $added[$i], 1 ) ) . 
               "  </td>" .
               "</tr>";
    }

    $stat_removed = 0; // reset counters
    $stat_added   = 0;

    $removed = array(); // clear arrays
    $added = array();
  }
}
//------------------------------------------------------------------------------

//------------------------------------------------------------------------------
// Diff
//------------------------------------------------------------------------------
function diff( $new, $old, $new_date, $old_date ) {
  $path = rand();

  $path_new = "diff/" . $path . "_new";
  $path_old = "diff/" . $path . "_old";

  // write tmp files
  if( !($file_new = fopen( $path_new, 'w' )) || 
      !($file_old = fopen( $path_old, 'w' )) ) return FALSE;
  if( fwrite( $file_new, $new . "\n" ) < 0 || 
      fwrite( $file_old, $old . "\n" ) < 0 ) return FALSE;
  fclose( $file_new );
  fclose( $file_old );

  // execute diff and get output
  exec( "diff -U 3 $path_old $path_new", $output );
  
  // delete tmp files
  unlink( $path_new );
  unlink( $path_old );

  // take away a little "intro" text
  array_shift( $output ); // --- blah blah
  array_shift( $output ); // +++ blah blah

  // some status variables
  $stat_removed = 0; // how many removed so far?
  $stat_added   = 0; // how many added so far?

  // formatter
  foreach( $output as $line ) {
    // match diff section headers: @@ -4,5 +4, 2 @@
    if( preg_match( "/@@ (-|\+)([0-9]+),?([0-9]*)? ?(\+[0-9]*)?,?([0-9]*)? @@/",
                    $line, $matches ) == 1 ) {
      diff_section( $diff, 
                    $stat_removed, $stat_added,
                    $removed, $added ); // write out removed and added entries

      $stat_removed = 0; // reset counters
      $stat_added   = 0;

      $diff .= "<tr class="line_start">" .
               "  <td colspan="4">" .
               "    <b>Starting at Line " . $matches[2] . ":</b>" .
               "  </td>" .
               "</tr>";
      continue;
    }

    // match removed lines: - Blah blah
    if( $line{0} == '-' ) {
      $removed[$stat_removed] = $line; // record lines
      $stat_removed++;
      continue;
    }

    // match added lines: + Blah blah
    if( $line{0} == '+' ) {
      $added[$stat_added] = $line; // record lines
      $stat_added++;
      continue;
    }

    diff_section( $diff, $stat_removed, $stat_added, $removed, $added );

    // no matches, we have context lines: Blah blah
    $diff .= "<tr>" .
             "  <td></td>" .
             "  <td class="context">" . 
                  htmlspecialchars( substr( $line, 1 ) ) . 
             "  </td>" .
             "  <td></td>" .
             "  <td class="context">" . 
                  htmlspecialchars( substr( $line, 1 ) ) . 
             "  </td>" .
             "</tr>";
  }

  // write out removed and added
  diff_section( $diff, $stat_removed, $stat_added, $removed, $added ); 

  // if there were no changes
  if( $diff == "" ) $diff = "<p>None</p>";
  else $diff = "<table class="diff">\n" .
               "  <tr class="line_start">\n" .
               "    <td colspan="2">" .
               "      <b>Previous Revision</b><br />$old_date" .
               "    </td>" .
               "    <td colspan="2">" .
               "      <b>This Revision</b><br />$new_date" .
               "    </td>" .
               "  </tr>" .
               "  $diff" .
               "</table>";

  return $diff;
}
//------------------------------------------------------------------------------
As you can see, the output is placed in a table, with css classes to format the different components. Here are the styles I had for these different elements in Sourceworld:

Code: Select all

/*---------------------------------------------------------------------------------
 * Diff elements
 *---------------------------------------------------------------------------------
 */
.diff
{
    margin-top: 15px;
    width: 100%;
    border-collapse: collapse;
    font-family: sans-serif, arial, georgia;
    font-size: x-small;
}
.diff td
{
    border: solid 1px #ffffff;
}
.context
{
    width: 50%;
    background-color: #f7f7f7;
}
.added
{
    width: 50%;
    background-color: #e0ffe0;
}
.removed
{
    width: 50%;
    background-color: #ffffe0;
}
.line_start
{
    font: small georgia, times, verdana, arial, sans-serif;
}
.line_start td
{
    padding-top: 10px;
    padding-left: 5px;
    padding-bottom: 5px;
}
For the most part, you should be able to customize the function just by changing the styles. If you use another diff util or the command with different options, you might get away with making only a few changes. Enjoy!

Basically, the function is rather simple to use...
If you had two strings, one containing the new version of some text ($new) and another containing the old version ($old) and you had the dates (formatted in any way) as strings $new_date and $old_date, you would do the following:

$ret = diff( $new, $old, $new_date, $old_date );

Now, $ret contains the diff formatted nicely in a self contained little table and you could place it in a HTML page you output.

Hope its useful.
Last edited by requinix on Mon Jun 24, 2013 3:37 pm, edited 1 time in total.
Reason: fixed some '{' and '}' encoding weirdness
User avatar
nigma
DevNet Resident
Posts: 1094
Joined: Sat Jan 25, 2003 1:49 am

Post by nigma »

Thanks for the snippet cravikiran.

I like your site, nice :)
Post Reply