Usage of htmlentities redisplaying form data

Discussions of secure PHP coding. Security in software is important, so don't be afraid to ask. And when answering: be anal. Nitpick. No security vulnerability is too small.

Moderator: General Moderators

matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Usage of htmlentities redisplaying form data

Post by matthijs »

A common feature of a web based form is that when a user makes a mistake a nice error message is displayed telling the user what data is missing and/or wrong. Of course the data which the user has already filled in must be prefilled.
Because I'm used to escape all output to html with htmlentities, I make my forms like this:

Code: Select all

<?php
$errormessage = '';
$clean = array();

if (isset($_POST['submit'])) {

// validate data, return errormessage if not valid



}
?>
<html>
<?php
if (isset($errormessage) && strlen($errormessage) >0) { echo $errormessage; }
?>

<form action="" method="post">

<input type="text" name="username" value="
<?php   
if (isset ($_POST['username'])) { echo htmlentities($_POST['username'], ENT_QUOTES, 'UTF-8') ; } 
?>
"  />

</form>
</html>
?>
However, in a thread some time ago someone questioned my use of htmlentities in this specific situation.
If someone fills in the form, gets one error and corrects that, the special characters in the data he entered the first time may have been converted. If he posts the form again these converted characters are posted to the processing script. That might not be what you want.

So my question: would it be better or safe to use echo $_POST['username'] (without the htmlentities) in this case, or does that open up the script for certain safety problems (XSS?, CSRF?). I know that by using GET as the action that will certainly be the case. But in this case it's POST data.

But Chris says the following about CSRF in his book: "there are known exploits for both GET and POST, so don't consider a strict use of POST to be adequate protection".

Any ideas?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

well, without running entities, if someone used a double quote in the field, it would get truncated. That, I define as bad, both from the user point of view and from the developer point of view.
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

Good point! Hadn't thought about that. So that's 2- 0 for using htmlentities :)
User avatar
Maugrim_The_Reaper
DevNet Master
Posts: 2704
Joined: Tue Nov 02, 2004 5:43 am
Location: Ireland

Post by Maugrim_The_Reaper »

Me I don't re-add prefills except for simple data which is easily filtered and validated. If its alphanumeric htmlentities() is icing on the cake for DID - you can for go it in some cases like this. Complex data (usually anything needing a Regexp) I tend just to let the user refill themselves - why invite trouble?
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

Me I don't re-add prefills except for simple data which is easily filtered and validated
The thing is, if I have a "simple" form with a few text fields, a textarea, and a couple of checkboxes. I just cannot let the user refill all data. That would be so unusable. Probably half of them won't make the effort to try again.
Also, if I only fill in the 'simple' data, the logic gets even more complex for me. Then I have to a) refill the easy fields with the data that was ok b) tell the user which fields were wrong or missing c) tell them which fields to fill in again because I was too lazy to code the auto refilling in.... :wink:
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

I thought I would share this class for you, since you can define any tags and tag attributes that are permitted, and the rest converted to non harmful entities.

Code: Select all

<?php
/**
 *=--------------------------------------------------------------------------=
 * strip_tags.inc
 *=--------------------------------------------------------------------------=
 * This contains code to do some tag stripping that is both MBCS-safe and
 * more powerful than the default tag stripping available on the PHP 
 * strip_tags function.  This basically involves writing a little HTML
 * parsing state machine.  I'm not the best at this, but it seems to work
 * quite well, and isn't terribly inefficient.
 *
 * Author: marc, 2005-05-08
 *
 * Note that this class assumes all input is UTF-8.
 *
 * UNDONE: marc, 2005-05-08: add support for other character set encodings.
 */

/**
 * Copyright (c) 2005
 *      Marc Wandschneider. All Rights Reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. The name Marc Wandschneider may not be used to endorse or promote
 *    products derived from this software without specific prior
 *    written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
 * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
 * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
 * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
 * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
 * OF USE, DATA, OR PROFITS; DEATH OF YOUR CAT, DOG, OR FISH; OR
 * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
 * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
 * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
 * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */

/**
 *=--------------------------------------------------------------------------=
 * StripTags
 *=--------------------------------------------------------------------------=
 * This class is how you use the tag stripping functionality.  In short, you
 * create an instance of it, passing to it an array keyed by HTML elements
 * that you would like to permit (or deny, based on the value of the second
 * parameter).  The VALUES of these indices are themselves arrays of permitted
 * attributes for each tag.  Thus, if we wanted to allow 'strong', 'em', 'a'
 * with href and title, and 'img' with src, border, and alt, we would 
 * create the object as follows:
 *
 * $tagsAndAttrs = array(
 *     'strong' => array(),    // no attrs allowed.
 *     'em' => array(),
 *     'a' => array('title', 'href'),
 *     'img' => array('src', 'border', 'alt')
 * );
 *
 * $st = new StripTags($tagsAndAttrs);
 *
 * Usage is then as simple as:
 *
 * $malicious = <<<EOS
 *   This is an eeveeeillll
 *   <script> document.location = "http://evilsite.url"; </script>
 *   string.
 * EOS;
 *
 * $fixed = $st->strip($malicious);
 */
class StripTags
{
  /**
   * RemoveColons Property:
   *
   * This property controls whether or not we should remove colons from
   * attribute values.  By default, we will remove any colons that do
   * not come right after an http, https, or ftp in an attribute
   * value.  This is very restrictive, but I haven't coded up a better
   * solution just yet ...
   */
  public $RemoveColons = TRUE;

  /**
   * List of tags and attributes.
   */
  protected $m_tagsAndAttrs;

  /**
   * Whether they are inclusive or exclusive lists 
   */
  protected $m_tagsInclusive;

  /**
   * These are used to maintain the state machine we use to parse through
   * strings.
   */
  protected $m_exploded;        // input string asploded
  protected $m_max;             // max size of m_exploded
  protected $m_x;               // current position
  protected $m_output;          // the string we're building.
  protected $m_currentTag;      // as we process attrs, the tag we're in.

  /**
   * Possible ways to end a tag -- SLASHGT /> and GT, >.
   */
  const SLASHGT = 0;
  const GT = 1;

  /**
   *=------------------------------------------------------------------------=
   * __construct
   *=------------------------------------------------------------------------=
   * Initialises a new instance of this class.  We require a list of tags and
   * attributes and a flag saying whether they are an inclusive or exclusive
   * set.
   *
   * Parameters:
   *    $in_tagsAndAttrs    - array keyed by tags, with values being arrays
   *                          of allowed attributes on those keys.
   *    $in_tagsincl        - [optional] are tags inclusive (only listed 
   *                          tags are allowed) or exclusive (all but those
   *                          tags are permitted).
   */
  public function __construct
  (
    $in_tagsAndAttrs,
    $in_tagsincl = TRUE
  )
  {
    if (!is_null($in_tagsAndAttrs) and !is_array($in_tagsAndAttrs))
      throw new InvalidArgumentException('$in_tagsAndAttrs');

    /**
     * save out the local vars, making sure that they have at least 
     * some value set.
     */
    $this->m_tagsAndAttrs = $in_tagsAndAttrs;
    if ($this->m_tagsAndAttrs === NULL)
      $this->m_tagsAndAttrs = array();
    $this->m_tagsInclusive = $in_tagsincl;
  }


  /**
   *=------------------------------------------------------------------------=
   * strip
   *=------------------------------------------------------------------------=
   * Removes evil baddie tags from the input string, excepting (or restricting
   * to) those tags specimified in the arguments to the constructor.
   *
   * Parameters:
   *      $in_string           - strip me please.
   * 
   *
   * Returns:
   *      stripped string.
   *
   * Notes:
   *      STRING IS ASSUMED TO BE UTF-8.
   */
  public function strip($in_string)
  {
    if ($in_string === NULL or $in_string == '')
      return '';

    /**
     * 1. explode the string into its constituent CHARACTERS (not bytes,
     *    which in UTF-8 are most certainly not the same thing).
     */
    $this->m_exploded = $this->explodeString($in_string);
    $this->m_max = count($this->m_exploded);

    /**
     * 2. Parse the string.  We will be quite robust about this, supporting
     *    arbitrary whitespace characters and > and < chars within attribute
     *    values (which is valid HTML, but prolly not valid XHTML).
     *    This will require setting up a bit of a state machine, which is a
     *    pain, but worth it.  Robustness is good.
     */
    $this->m_output = array();
    for ($this->m_x = 0; $this->m_x < $this->m_max; $this->m_x++)
    {
      if ($this->m_exploded[$this->m_x] != '<')
	$this->m_output[] = $this->m_exploded[$this->m_x];
      else
	$this->processTag();
    }

    return $this->rebuildString($this->m_output);
  }


  /**
   *=------------------------------------------------------------------------=
   * processTag
   *=------------------------------------------------------------------------=
   * We have encountered a tag in our string.  See if it's a valid tag and
   * process (out) any attributes within it.
   */
  protected function processTag()
  {
    /**
     * 1. Get the name of the tag and see if it's valid or not.
     */
    $this->m_x++;
    $tagName = $this->getTagName();
    if ($tagName === NULL)
      return ;                // there's nothing there!  

    if (!$this->isPermissibleTag($tagName))
    {
      $this->processEndOfTag();
      return;
    }
    else if (substr($tagName, 0, 1) == '/')
    {
      /**
       * If it's a closing tag, just consume everything up until the
       * closing tag character.
       */
      $this->processEndOfTag(FALSE);
      $fullTag = "<$tagName>";
      $this->m_output = array_merge($this->m_output,
				    $this->explodeString($fullTag));
    }
    else
    {
      /**
       * tag's valid.  go and get any attributes associated with it.
       */
      $this->m_currentTag = $tagName;
      $attrs = $this->processAttributes();
      $fullTag = "<$tagName";
      foreach ($attrs as $attr)
      {
        if ($attr['value'] != '')
        {
          $fullTag .= " {$attr['name']}=\""
            . $this->furtherProcess($attr['value']) . "\"";
        }
        else
          $fullTag .= " {$attr['name']}";
      }

      /**
       * figure out closing tag type and duplicate.
       */
      $tagType = $this->processEndOfTag();
      $fullTag .= ($tagType == StripTags::SLASHGT) ? '/>' : '>';
      $this->m_output = array_merge($this->m_output,
                                    $this->explodeString($fullTag));
    }
  }


  /**
   *=------------------------------------------------------------------------=
   * getTagName
   *=------------------------------------------------------------------------=
   * Given that we are positioned RIGHT after the opening < char, go and
   * find the name of the tag.  We will actually handle the case where we 
   * are given an empty tag, like < > or < />.
   *
   * Returns:
   *      string name or NULL indicating empty tag (or EOS)
   */
  protected function getTagName()
  {
    /**
     * skip over any space chars.
     */
    $this->consumeWhiteSpace();
    $tag = array();

    /**
     * Is it a closing tag??
     */
    if ($this->m_x < $this->m_max and $this->m_exploded[$this->m_x] == '/')
    {
      $tag[] = '/';
      $this->m_x++;
    }

    /**
     * now get anything until the next whitespace character or /> or >.
     */
    while ($this->m_x < $this->m_max
           and !$this->isSpaceChar($this->m_exploded[$this->m_x])
           and ($this->m_exploded[$this->m_x] != '>')
           and !($this->m_exploded[$this->m_x] == '/'
                 and $this->m_x + 1 < $this->m_max
                 and $this->m_exploded[$this->m_x + 1] == '>'))
    {
      $tag[] = $this->m_exploded[$this->m_x];
      $this->m_x++;
    }

    if (count($tag) == 0)
      return NULL;
    else
      return $this->rebuildString($tag);
  }


  /**
   *=------------------------------------------------------------------------=
   * isPermissibleTag
   *=------------------------------------------------------------------------=
   * Checks to see whether the given tag is valid or not given the user's
   * options to our constructor.
   *
   * Parameters:
   *      $in_tagName                   - tag name to check.
   *
   * Returns:
   *      TRUE == ok, FALSE == AAAIEEEE!!!
   */
  protected function isPermissibleTag($in_tagName)
  {
    /**
     * If it's a closing tag, remove the / for the purposes of this search.
     */
    if (substr($in_tagName, 0, 1) == '/')
      $check = substr($in_tagName, 1);
    else
      $check = $in_tagName;

    /**
     * Zip through all the tags in the array seeing if it is
     * valid.  We have to see if they gave us an inclusive or
     * exclusive list of permissible tags.
     */
    foreach ($this->m_tagsAndAttrs as $tag => $attrs)
    {
      $t = trim($tag);
      if ($this->m_tagsInclusive)
      {
        if ($t == $check)
          return TRUE;
      }
      else
      {
        if ($t == $check)
          return FALSE;
      }
    }

    return $this->m_tagsInclusive ? FALSE : TRUE;
  }



  /**
   *=------------------------------------------------------------------------=
   * processEndOfTag
   *=------------------------------------------------------------------------=
   * Skip all characters looking for the end of tag (> or />).  Unfortunately,
   * we cannnot simply zip through the string looking for these two 
   * character sequences, as they might be embedded within quotes.  We thus
   * have to manage a little state and remember whether or not we are in
   * quotes ...
   *
   * Parameters:
   *      $in_slashGTOk          - is /> allowed or only >  ??
   *
   * Returns:
   *      SLASHGT or GT, indicating which type of closing tag was found.
   */
  protected function processEndOfTag($in_slashGTOk = TRUE)
  {
    /**
     * This is not as simple as just looking for the next > character,
     * as that might be within an attribute string.  We will thus
     * have to maintain some state and make sure that we handle that
     * case properly.
     */
    $in_quote = FALSE;
    $quote_char = '';

    while ($this->m_x < $this->m_max)
    {
      switch ($this->m_exploded[$this->m_x])
      {
        case '\'':
          if ($in_quote and $quote_char == '\'')
            $in_quote = FALSE;
          else if (!$in_quote)
          {
            $in_quote = TRUE;
            $quote_char = '\'';
          }
          $this->m_x++;
          break;

        case '"':
          if ($in_quote and $quote_char == '"')
            $in_quote = FALSE;
          else if (!$in_quote)
          {
            $in_quote = TRUE;
            $quote_char = '"';
          }

          $this->m_x++;
          break;

        case '/':
          if (!$in_quote
              and ($this->m_x + 1 < $this->m_max)
              and $this->m_exploded[$this->m_x + 1] = '>'
              and $in_slashGTOk)
          {
            $this->m_x += 1;
            return StripTags::SLASHGT;
          }
          
          $this->m_x++;
          break;

        case '>':
          if (!$in_quote)
          {
            return StripTags::GT;
          }

          $this->m_x++;
          break;

        default:
          $this->m_x++;
          break;
      }
    }
  }


  /**
   *=------------------------------------------------------------------------=
   * processAttributes
   *=------------------------------------------------------------------------=
   * Given that we have a valid tag name, we are now going to go process its
   * attributes and see how we like them.  We will assume that all are in one
   * of the two following formats:
   *
   *  attribute = value   [valid is quoted string or single word]
   *  attribute           [attribute is sequence of non-space chars]
   *
   * Returns:
   *      an array of 'attrName' => 'attrValue' pairs.
   *
   * Note:
   *      the $m_x 'cursor' is pointing to the first space char right after
   *      the attr name.
   */
  protected function processAttributes()
  {
    $attrs = array();

    while (($attrDetails = $this->nextAttribute()) !== NULL)
    {
      if ($this->isPermissibleAttribute($attrDetails['name']))
        $attrs[] = $attrDetails;
    }

    return $attrs;
  }


  /**
   *=------------------------------------------------------------------------=
   * nextAttribute
   *=------------------------------------------------------------------------=
   * We are processing a tag.  Get the next attribute, or return NULL if there
   * are no mo'.
   *
   * Returns:
   *      an array with 'attrName' => 'attrValue' or NULL if there is not 
   *      another attribute.
   */
  protected function nextAttribute()
  {
    /**
     * skip over any space chars.
     */
    $this->consumeWhiteSpace();

    /**
     * 1. Attribute Name.
     *
     * Now get anything until the next whitespace character, = character,
     * end of tag (> or />), or end of buffer.
     */
    $attr = array();
    while ($this->m_x < $this->m_max
           and !$this->isSpaceChar($this->m_exploded[$this->m_x])
           and ($this->m_exploded[$this->m_x] != '=')
           and ($this->m_exploded[$this->m_x] != '>')
           and !($this->m_exploded[$this->m_x] == '/' 
                 and $this->m_x + 1 < $this->m_max
                 and $this->m_exploded[$this->m_x + 1] == '>'))
    {
      $attr[] = $this->m_exploded[$this->m_x];
      $this->m_x++;
    }

    /**
     * If it's at the end of of the tag or the end of the string, then
     * evidence suggests we only got an attribute name.
     */
    if ($this->m_x == $this->m_max
        or $this->m_exploded[$this->m_x] == '>'
        or $this->m_exploded[$this->m_x] == '/')
    {
      if (count($attr) > 0)
        return array('name' => $this->rebuildString($attr), 'value' => '');
      else
        return NULL;
    }

    /**
     * We got a space.  If there is an = sign ahead after only whitespaces,
     * then that will point to the value.  Otherwise, we only have an attr
     * name.
     */
    if ($this->isSpaceChar($this->m_exploded[$this->m_x]))
    {
      if (!$this->peekAheadInTag('=')) 
        return array('name' => $this->rebuildString($attr), 'value' => '');
    }

    /**
     * otherwise, if we're here, then we're at an equals sign.
     */
    $this->m_x++;
    $this->consumeWhiteSpace();
    if ($this->m_x == $this->m_max)
      return array('name' => $this->rebuildString($attr), 'value' => '');
    
    /**
     * 2. Attribute Value
     *
     * Now get anything until the next whitespace character,
     * end of tag (> or />), or end of buffer.  We have to be careful,
     * however, to be able to handle a string enclosed attribute
     * value, such as 'this is a value'.
     */
    $in_quote = FALSE;
    $quote_char = '';
    $value = array();
    if ($this->isQuoteChar($this->m_exploded[$this->m_x]))
    {
      $in_quote = TRUE;
      $quote_char = $this->m_exploded[$this->m_x];
      $this->m_x++;
    }

    /**
     * This is an annoying expression.  We want to skip characters IF:
     *
     * - we are in a quoted attr value and the current character is not
     *   the closing quote.
     * OR
     * - we are NOT in a quoted attr value and the current character is
     *   not:
     *    - EOS
     *    - >
     *    - />
     *    - white space
     *
     * In all cases, don't go past EOS.
     */
    while (($in_quote
            and $this->m_x < $this->m_max
            and $this->m_exploded[$this->m_x] != $quote_char)
           or (!$in_quote
               and  ($this->m_x < $this->m_max
                     and !$this->isSpaceChar($this->m_exploded[$this->m_x])
                     and ($this->m_exploded[$this->m_x] != '>')
                     and !($this->m_exploded[$this->m_x] == '/' 
                           and $this->m_x + 1 < $this->m_max
                           and $this->m_exploded[$this->m_x + 1] == '>'))))
    {
      $value[] = $this->m_exploded[$this->m_x];
      $this->m_x++;
    }

    if ($this->m_x < $this->m_max
        and $in_quote and $this->m_exploded[$this->m_x])
    {
      $this->m_x++;
    }

    /**
     * return the attribute name and value.
     */
    return array('name' => $this->rebuildString($attr), 
                 'value' => $this->rebuildString($value));
  }


  /**
   *=------------------------------------------------------------------------=
   * peekAheadInTag
   *=------------------------------------------------------------------------=
   * Looks to see if the NEXT NON-WHITESPACE character is the specified
   * character.
   *
   * Parameters:
   *      $in_char                      - character to look for.
   *
   * Returns:
   *      TRUE -- it is!  FALSE, it's not!
   *
   * Notes:
   *      IFF TRUE is returned, then $this->m_x is updated to point at this
   *      character.
   */
  protected function peekAheadInTag($in_char)
  {
    $x = $this->m_x;
    while ($x < $this->m_max and $this->isSpaceChar($this->m_exploded[$x])
           and $this->m_exploded[$x] != $in_char)
    {
      $x++;
    }

    if ($x == $this->m_max)
      return FALSE;
    else if ($this->m_exploded[$x] == $in_char)
    {
      $this->m_x = $x;
      return TRUE;
    }
    else
      return FALSE;
  }


  /**
   *=------------------------------------------------------------------------=
   * isPermissibleAttribute
   *=------------------------------------------------------------------------=
   * Checks to see whether the given attribute is valid or not given the
   * user's options to our constructor.
   *
   * Parameters:
   *      $in_attrName                   - attribute name to check.
   *
   * Returns:
   *      TRUE == ok, FALSE == AAAIEEEE!!!
   */
  protected function isPermissibleAttribute($in_attrName)
  {
    $attrs = $this->m_tagsAndAttrs[$this->m_currentTag];
    if ($attrs === NULL)
      $attrs = array();

    /**
     * Zip through all the attributes in the array seeing if it is
     * valid.  We have to see if they gave us an inclusive or
     * exclusive list of permissible attributes.
     */
    $check = strtolower($in_attrName);
    foreach ($attrs as $attr)
    {
      $t = strtolower(trim($attr));
      if ($t == $check)
        return TRUE;
    }

    return FALSE;
  }


  /**
   *=------------------------------------------------------------------------=
   * explodeString
   *=------------------------------------------------------------------------=
   * Takes the string given as input and asplodes it into its constituent
   * characters.
   *
   * Parameters:
   *      $in_string                    - asplode me please.
   *
   * Returns:
   *      Array of characters.
   *
   * Notes:
   *      We are assuming you have compiled PHP with mbstring turned on (Unix
   *      / OS X) or php_mbstring.dll enabled (Winders).  You should also have
   *      the following set up in your php.ini:
   *
   *      mbstring.language = Neutral
   *      mbstring.internal_encoding = UTF-8
   *      mbstring.http_output = UTF-8
   *      mbstring.func_overload = 7
   */
  protected function explodeString($in_string)
  {
    /**
     * UNDONE: marc, 2005-05-08: is there a faster way to do this?
     *         this is kinda sucky.
     *
     * We'll go ahead and call the mb_ functions here to make it clear
     * that we will not be testing this on non-mbcs installations.
     */
    $asploded = array();
    $len = strlen($in_string);
    $str = $in_string;
    for ($x = 0; $x < $len; $x++)
    {
      $asploded[] = substr($str, $x, 1);
    }

    return $asploded;
  }


  /**
   *=------------------------------------------------------------------------=
   * consumeWhiteSpace
   *=------------------------------------------------------------------------=
   * Sucks up characters as long as there are characters left in the string
   * AND they are whitespace characters.
   *
   * Notes:
   *      This function does not accept double-wide space characters such as
   *      those seen in Asian character sets.  We assume these are not valid
   *      in HTML documents.
   */
  protected function consumeWhiteSpace()
  {
    while ($this->m_x < $this->m_max 
           and $this->isSpaceChar($this->m_exploded[$this->m_x]))
      $this->m_x++;
  }


  /**
   *=------------------------------------------------------------------------=
   * isSpaceChar
   *=------------------------------------------------------------------------=
   * Is the next character a whitespace character.  We do NOT include
   * double wide space characters from Asian character sets.
   *
   * Parameters:
   *      $in_char                      - Character to examiminine.
   *
   * Returns:
   *      TRUE == it's a Space!   FALSE == it's not a space!
   */
  protected function isSpaceChar($in_char)
  {
    switch ($in_char)
    {
      case ' ':
      case "\t":
      case "\n":
      case "\r":
      case "\v":
        return TRUE;
      default:
        return FALSE;
    }
  }


  /**
   *=------------------------------------------------------------------------=
   * isQuoteChar
   *=------------------------------------------------------------------------=
   * Asks whether the given UTF-8 character is a quote character.  We only
   * char about ISO-8859 quote characters, namely " and '.
   *
   * Parameters:
   *      $in_char                 - check me please.
   *
   * Returns:
   *      TRUE, si, ees a quote.
   *      FALSE, no mang, ees no a quote.
   */
  protected function isQuoteChar($in_char)
  {
    switch ($in_char)
    {
      case '\'':
      case '"':
        return TRUE;
    }

    return FALSE;
  }


  /**
   *=------------------------------------------------------------------------=
   * rebuildString
   *=------------------------------------------------------------------------=
   * Takes an array of characters and reconstructs a string for it.
   *
   * Parameters:
   *      $in_charArray                  - array containing chars
   *
   * Returns:
   *      string for those .
   */
  protected function rebuildString($in_charArray)
  {
    return implode('', $in_charArray);
  }


  /**
   *=------------------------------------------------------------------------=
   * furtherProcess
   *=------------------------------------------------------------------------=
   * This function actually inspects the text of an attribute string, and 
   * takes further action to try and prevent XSS by removing colons:
   *
   * UNDONE: marcwan, 2005-06-08: I'd like to have a better/more robust
   *         solution to inline scripting attacks.  Must ponder more.
   *
   * Parameters:
   *     $in_attrString                 - the attribute string to process more
   *
   * Returns:
   *     processed (and hopefully safe) attribute string.
   *
   * Notes:
   *     this function does nothing if RemoveColons is set to FALSE
   */
  protected function furtherProcess($in_attrString)
  {
    if (!$this->RemoveColons)
      return $in_attrString;

    //
    // 1. strip out colon characters, unless the attr value begins
    // with http:...
    //
    if (!ereg('^(https:|http:|ftp:)', trim($in_attrString), $matches))
    {
      $str = $in_attrString;
      $prefix = '';
    }
    else
    {
      $str = substr(trim($in_attrString), strlen($matches[0]));
      $prefix = $matches[0];
    }

    $str = ereg_replace(':', '', $str);

    //
    // 2. strip out ANY representations of : characters ... i can't imagine
    //    that this is even remotely efficient.
    //
    $str = ereg_replace('(&#[0]*58;|&#[xX][0]*3[aA];|\\\\[xX][0]*3[aA];|\\\\[uU][0]*3[aA];)', '', $str);

    return $prefix.$str;
  }


  /**
   *=------------------------------------------------------------------------=
   * escapeDoubleQuotes
   *=------------------------------------------------------------------------=
   * Double quotes will be replaced by "
   *
   * Parameters:
   *      $in_string               - string to fix.
   *
   * Returns:
   *      fixed string.
   */
  protected function escapeDoubleQuotes($in_string)
  {
    return ereg_replace('"', '"', $in_string);
  }

}


?>
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

Jcart, thanks for sharing that class. Looks impressive. I'll definately study the code and see if I can use it. But for that I will have to solidly understand the code first. I've seen several filter classes and sometimes they aren't as foolproof as one might hope they are. It's not easy to really filter out all unwanted tags and tag attributes, isn't it?

Please don't consider my suspicious attitude as offensive, I do appreciate your help and I'll study your class. What do you thinkabout projects/classes like inputfilter,safehtml and kses?
User avatar
shiflett
Forum Contributor
Posts: 124
Joined: Sun Feb 06, 2005 11:22 am

Re: Usage of htmlentities redisplaying form data

Post by shiflett »

matthijs wrote:If someone fills in the form, gets one error and corrects that, the special characters in the data he entered the first time may have been converted. If he posts the form again these converted characters are posted to the processing script. That might not be what you want.
This is incorrect. Escaping preserves data in a different context. The key word here is preserves. Using htmlentities() as you are is a good practice - don't let anyone tell you otherwise. :-)

To make this clearer, view the following HTML in a browser:

Code: Select all

<form>
<input type="text" name="foo" value="<bar>" />
</form>
You should see the original, preserved value, not the escaped value.
matthijs wrote:So my question: would it be better or safe to use echo $_POST['username'] (without the htmlentities) in this case, or does that open up the script for certain safety problems (XSS?, CSRF?).
XSS, yes.
matthijs wrote:But Chris says the following about CSRF in his book: "there are known exploits for both GET and POST, so don't consider a strict use of POST to be adequate protection".
Cross-site request forgeries are a bit different. Protecting yourself from these types of attacks involves the use of a token in each form that you want to protect, so you can make it more difficult for someone to spoof a request from another person - the attacker would have to know someone else's token.

Hope that helps.
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

Thanks again for your clear explanation Chris. It seems I had the right ideas, but was a bit confused in some places. You clarified those points.

I'll reread the chapter about CSRF in your book and the new article from Ilia in php architect.

I also saw your post with your "question" today. Guess you answered it yourself now? :)

Anyway, your help - and of course also that from the others - is very much appreciated. It's a strange but pleasant feeling how people I don't know personally like you and others here on this forum help eachother.
User avatar
shiflett
Forum Contributor
Posts: 124
Joined: Sun Feb 06, 2005 11:22 am

Post by shiflett »

matthijs wrote:I also saw your post with your "question" today. Guess you answered it yourself now?
Well, my comment form still has not been exploited, so I think CSRF's lack of popularity is both good and bad. It's bad, because most web apps are vulnerable, but it's good, because not many people know how to exploit it. Now, if only it were easier to explain without demonstrating exploits...
matthijs wrote:Anyway, your help - and of course also that from the others - is very much appreciated. It's a strange but pleasant feeling how people I don't know personally like you and others here on this forum help eachother.
Thanks very much for that. :-)

The one weird thing about the open source community is that the more you try to help people, the more criticism you receive. Some people even presume to guess your personality and intentions from answers to questions, deciding that your intentions are disingenuous or self-serving. I've never really understood that, but friendly people like you make it easier to keep on keeping on. :-)
User avatar
Maugrim_The_Reaper
DevNet Master
Posts: 2704
Joined: Tue Nov 02, 2004 5:43 am
Location: Ireland

Post by Maugrim_The_Reaper »

Its not just an open source phenomenon - it happens everywhere ;). Some people just get their, eh, panties in a bunch over assumptions they make based on paranoid delusions.

I think they get suspicious about all this odd idea of helping people without getting paid somewhere along the line - hmph, capitalists...
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

The one weird thing about the open source community is that the more you try to help people, the more criticism you receive. Some people even presume to guess your personality and intentions from answers to questions, deciding that your intentions are disingenuous or self-serving.
I understand what you mean. It surprises me as well. But I think that for every one of those there are many many others who will know how it is for real. So to stay on topic, I guess it's a matter of filtering "input". But nonetheless I can imagine it's not nice to be critised in an unfair way, or worse then that.
Some people just get their, eh, panties in a bunch over assumptions they make based on paranoid delusions.
understand what you're saying, but I'll have to find out what that saying means :)
Ree
Forum Regular
Posts: 592
Joined: Fri Jun 10, 2005 1:43 am
Location: LT

Post by Ree »

Hm, actually I have never used htmlentities for the data to be output in text fields/areas as I never thought you could run something bad in a text field. ;) It seems like I should?
Roja
Tutorials Group
Posts: 2692
Joined: Sun Jan 04, 2004 10:30 pm

Post by Roja »

shiflett wrote:The one weird thing about the open source community is that the more you try to help people, the more criticism you receive. Some people even presume to guess your personality and intentions from answers to questions, deciding that your intentions are disingenuous or self-serving. I've never really understood that, but friendly people like you make it easier to keep on keeping on. :-)
Talk about getting meta-off-topic. :)

However, its a valid question, and its one I've heard before. (Heck, you can argue that my sig makes a similar statement!).

It doesn't seem to shock scientists who end up in the opensource community. The reason being is that it is the exact same process as academic validation. Anytime you put forth an opinion - controversial or not - it has to be scrutinized. Whether it is with constructive criticism, or argument, or flat out rude personal accusations, the attacks have to come.

Think of Einstein, and having to defend his theory. His theory, when first introduced, was widely thought to be completely inaccurate and - I quote - "Hopelessly elementary". Yet time has (for the most part) vindicated his findings.

Such is the nature of the opensource community. There are hundreds of thousands of developers - voices - all shouting loudly "I have an idea!". Not all of them can be right. In fact, the vast majority are wrong in the vast majority of cases. That means that plenty of people are going to line up to criticize.

As uncomfortable as it is to stand before a crowd of people yelling "YOU ARE STUPID!", thats exactly what it takes. You have to defend your position. If, during the course of doing so, you realize you are wrong, and that there is a better way - admit it. That allows the process and the position to improve over time. Thats how we end up with things like Input Filtering as a general concept. Instead of doing it on each and every variable, now we do it as a general class - because someone, somewhere asked "Shouldn't we centralize that?".

Humans have a serious problem with admitting being wrong. Especially when in public/social situations, and even moreso when they've put forward their opinion.

And thats exactly why the opensource method is so much stronger than proprietary, hide-behind-closed-doors programming.

You can't hide from being wrong. By extension, the code can't stay stupid for long (if it expects to keep users using it).

I apologize for going all meta, but I see the opensource community in a different light, and that comment deserved a response with some context.
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

shiflett wrote: The one weird thing about the open source community is that the more you try to help people, the more criticism you receive. Some people even presume to guess your personality and intentions from answers to questions, deciding that your intentions are disingenuous or self-serving. I've never really understood that, but friendly people like you make it easier to keep on keeping on. :-)
You forgot that you're on devnetwork instead of sitepoint.. A world of difference :)
Post Reply