Prettying XML output

Small, short code snippets that other people may find useful. Do you have a good regex that you would like to share? Share it! Even better, the code can be commented on, and improved.

Moderator: General Moderators

Post Reply
TJ
Forum Newbie
Posts: 20
Joined: Thu Nov 03, 2005 10:22 pm
Location: Nottingham, UK

Prettying XML output

Post by TJ »

I just had need to pretty-format the output of DOMDocument->saveXML() for ease of reading, which comes as a string with no linefeeds or indentation of the XML.

I thought I'd share the function in case others have need of something similar in the future; it took a while to get it settled, and it might need tweaking for your circumstances.

Code: Select all

<?php
/**
 * Pretty an XML string typically returned from DOMDocument->saveXML()
 *
 * Ignores ?xml !DOCTYPE !-- tags (adjust regular expressions and pad/indent logic to change this)
 *
 * @param   string $xml the xml text to format
 * @param   boolean $debug set to get debug-prints of RegExp matches
 * @returns string formatted XML
 * @copyright TJ 2005
 * @license GNU Lesser General Public Licence version 2
 * @link kml.tjworld.net
*/
function prettyXML($xml, $debug=false) {
  // add marker linefeeds to aid the pretty-tokeniser
  // adds a linefeed between all tag-end boundaries
  $xml = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $xml);

  // now pretty it up (indent the tags)
  $tok = strtok($xml, "\n");
  $formatted = ''; // holds pretty version as it is built
  $pad = 0; // initial indent
  $matches = array(); // returns from preg_matches()

  /* pre- and post- adjustments to the padding indent are made, so changes can be applied to
   * the current line or subsequent lines, or both
  */
  while($tok !== false) { // scan each line and adjust indent based on opening/closing tags

    // test for the various tag states
    if (preg_match('/.+<\/\w[^>]*>$/', $tok, $matches)) { // open and closing tags on same line
      if($debug) echo " =$tok= ";
      $indent=0; // no change
    }
    else if (preg_match('/^<\/\w/', $tok, $matches)) { // closing tag
      if($debug) echo " -$tok- ";
      $pad--; //  outdent now
    }
    else if (preg_match('/^<\w[^>]*[^\/]>.*$/', $tok, $matches)) { // opening tag
      if($debug) echo " +$tok+ ";
      $indent=1; // don't pad this one, only subsequent tags
    }
    else {
      if($debug) echo " !$tok! ";
      $indent = 0; // no indentation needed
    }
    
    // pad the line with the required number of leading spaces
    $prettyLine = str_pad($tok, strlen($tok)+$pad, ' ', STR_PAD_LEFT);
    $formatted .= $prettyLine . "\n"; // add to the cumulative result, with linefeed
    $tok = strtok("\n"); // get the next token
    $pad += $indent; // update the pad size for subsequent lines
  }
  return $formatted; // pretty format
}

echo "\r\n" . prettyXML("<root><this><is>a</is><test /></this></root>", true);
?>
Last edited by TJ on Fri Nov 25, 2011 3:01 pm, edited 2 times in total.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

seems to be more of a snippet

:)
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Agreed. Moved to Code Snipplets.
TJ
Forum Newbie
Posts: 20
Joined: Thu Nov 03, 2005 10:22 pm
Location: Nottingham, UK

Post by TJ »

Ahhh... ta very muchly... after 36-hours non-stop coding all the forums tend to look alike :lol:
Post Reply