Page 1 of 2

[Challenge - Advanced] What color is the text?

Posted: Tue Aug 25, 2009 4:14 am
by onion2k
Write a script that can take the address of a webpage and the id of an HTML element on that page, and returns the HTML hex color of the element.

This sounds fairly simple, but it really, really isn't. To be honest, it's a massive amount of work so if you're going to take part you might want to collaborate with other people.

Given the id "mytext" all of these should return "#FF0000":

Code: Select all

<font id="mytext" color="#FF0000">Text</font>

Code: Select all

<font color="#FF0000"><h1 id="mytext">Text</h1></font>

Code: Select all

<h1 id="mytext" style="color: #FF0000">Text</h1>

Code: Select all

<style>#mytext { color: #FF0000; }</style>
<div id="mytext">Text</div>
If you can make it cope with external styles as well that'd be cool. And if it could cope with cascading styles properly.. woah!

Re: [Challenge - Advanced] What color is the text?

Posted: Tue Aug 25, 2009 7:01 am
by klevis miho
That sounds really simple, but i haven't got a clue :)

Re: [Challenge - Advanced] What color is the text?

Posted: Tue Aug 25, 2009 7:26 am
by jayshields
Can you require that the requested page is well-formed, valid XML?

Re: [Challenge - Advanced] What color is the text?

Posted: Tue Aug 25, 2009 7:42 am
by onion2k
jayshields wrote:Can you require that the requested page is well-formed, valid XML?
Sure, if you like. If you do then it might be nice for the script to give user a nice error message like "Can't parse the page" instead of returning nothing though.

Re: [Challenge - Advanced] What color is the text?

Posted: Tue Aug 25, 2009 8:06 am
by juma929
Hello,

:banghead: I have so much studying to do but now I cant keep my mind off building this! 8O.

Might have to give it a quick sneaky go, seems simple enough but as with most things, it's probably a lot more complicated than it seems at first glance.

Might have to discipline myself and not even start it :P

Re: [Challenge - Advanced] What color is the text?

Posted: Tue Aug 25, 2009 9:51 am
by jayshields
If someone does this cheap 'n' dirty and grabs a snapshot of the page, finds the x and y co-ords of the element and scrapes the colour out I will give them one English pound. Would be much more fun than messing around with parsers too :)

Re: [Challenge - Advanced] What color is the text?

Posted: Tue Aug 25, 2009 10:35 am
by Weirdan
jayshields wrote:finds the x and y co-ords of the element and scrapes the colour out
Finding coordinates is more complicated than finding color, and would require page parsing as well.

Re: [Challenge - Advanced] What color is the text?

Posted: Tue Aug 25, 2009 12:11 pm
by Darhazer
Pretty interesting actually. I will try to do something over the weekend :)

Re: [Challenge - Advanced] What color is the text?

Posted: Wed Aug 26, 2009 5:11 am
by jayshields
Weirdan wrote:
jayshields wrote:finds the x and y co-ords of the element and scrapes the colour out
Finding coordinates is more complicated than finding color, and would require page parsing as well.
Yeah exactly but it would probably end up being less complicated than parsing CSS and HTML files and getting it to work with inherited stuff in both HTML and CSS. Plus it would work with malformed mark-up.

It would require some of the logic that those live screenshot scripts/websites use.

Edit: I just thought of a relatively simple way to do this - does Lynx support different font colours? Or any text-based browser? The source of that will get you most of the way there.

Re: [Challenge - Advanced] What color is the text?

Posted: Wed Aug 26, 2009 5:20 am
by lorenzo-s
I think an outline can be:

1) search for the element (with regex find id="mytext")
2) search for style attribute, then for colour
3) if not found, search for #mytext in CSSs, then for colour
4) if not found, get class attribute for element, search for .classname in CSSs, then for colour
6) if not found, get element parent, and back to point 2) getting new ID

Re: [Challenge - Advanced] What color is the text?

Posted: Wed Aug 26, 2009 5:47 am
by jayshields
To do it properly there's a almost end-less amount of possibilities as to where an element could get it's colour from. What about if JavaScript manipulates the colour of the text on page load depending on something like which browser you're using?

Re: [Challenge - Advanced] What color is the text?

Posted: Wed Aug 26, 2009 7:24 am
by onion2k
jayshields wrote:To do it properly there's a almost end-less amount of possibilities as to where an element could get it's colour from. What about if JavaScript manipulates the colour of the text on page load depending on something like which browser you're using?
I think it's safe to ignore scripting. Everything else should be ok though.. you've really only got to examine the color attribute, style attribute, class and id, each recursively up the CSS or HTML tree. It's a lot of work, sure, but it's possible. And once you have that code in place it'd be easy to expand it out to examine other attributes at which point the code becomes extremely useful.

Re: [Challenge - Advanced] What color is the text?

Posted: Wed Aug 26, 2009 2:02 pm
by Benjamin
Here's how I would do it:

1. Parse the html file so that you can detect inline css and external style sheets.
2. Parse the css and use the results to populate a class.
3. Using this class (or classes) create methods that allow you to pull css color attributes for elements with specific classes or id's
4. Parse the html file. Starting with the html tag, retrieve the color attributes. These color attributes will cascade and be assigned to relevant child elements unless overridden. <font color="foo"> will be overridden by css. Inline css will override all.
5. When the element with the matching id is found, you'll have the color.

This is not as hard as it seems when broken down into pieces. The hardest part would be creating the css and html parsers. Keep in mind however, you only need to pay attention to color attributes.

My Entry

Posted: Thu Aug 27, 2009 11:10 pm
by Zoxive
I was bored enough today to try this out.

text_attr.php

Code: Select all

 
<?php
/**
 *  Text_Attr Class
 * 
 * @author  Zoxive - kyle@zoxive.com
 * @date    27th August 2009
 */
class Text_Attr{
 
  protected $doc    = NULL;
  protected $dom    = NULL;
  protected $url    = NULL;
 
  /**
   * Constructs a new Text_Attr object.
   * 
   * @param   string  HTML document
   * @param   string  URL of such document, if not local
   * @return void
  */
  public function __construct($doc = NULL, $url = NULL)
  {
    $this->doc      = $doc;
    $this->dom      = DOMDocument::loadHTML($doc);
    $this->url      = $url;
  }
 
  /**
   * Locates the Attribute inside the document
   *  
   * @param   string  name of HTML ID
   * @param   string  name of HTML/CSS Attr
   * @return  string  value of Attr
  */
  public function get_attr($selector, $attr)
  {
    if(
    // Inline html
    ($Result = $this->inline_attr($selector,$attr)) ||
    // Inline CSS
    ($Result = $this->inline_css_attr($selector,$attr)) ||
    // Parent html
    ($Result = $this->parent_attr($selector,$attr)) ||
    // Parent CSS
    ($Result = $this->parent_css_attr($selector,$attr)) ||
    // Embedded CSS
    ($Result = $this->embedded_css_attr($selector,$attr)) ||
    // Linked CSS
    ($Result = $this->linked_css_attr($selector,$attr))
    )
      return $Result;
    else
      return NULL;
  }
 
  /**
   * Searches the local dom for the Attr
   * 
   * @param   string  name of HTML ID
   * @param   string  name of HTML/CSS Attr
   * @return  string  value of Attr
  */
  protected function inline_attr($selector, $attr)
  {
    //return pq($selector)->attr($attr);
    $ele = $this->dom->getElementById($selector);
    if(!$ele) return NULL;
    if($ele->hasAttribute($attr))
      return $ele->getAttribute($attr);
  }
 
  /**
   * Looks at inline CSS for the Attr
   * 
   * @param   string  name of HTML ID
   * @param   string  name of HTML/CSS Attr
   * @return  string  value of Attr
  */
  protected function inline_css_attr($selector, $attr, $ele = NULL)
  {
    if(empty($ele))
      $ele = $this->dom->getElementById($selector);
    if(empty($ele)) return NULL;
    if($ele->hasAttribute('style')){
      $style = $ele->getAttribute('style');
      $style = split(':',$style);
 
      if(strtolower($attr) == strtolower($style[0]))
        return trim($style[1]);
    }
  }
 
  /**
   * Looks at all the parents for the attr
   * 
   * @param   string  name of HTML ID
   * @param   string  name of HTML/CSS Attribute
   * @return  string  value of Attr
  */
  protected function parent_attr($selector, $attr)
  {
    $ele = $this->dom->getElementById($selector);
    while($ele = $ele->parentNode){
      if(is_a($ele,'DOMELEMENT') && ($result = $ele->getAttribute($attr))) return $result;
    }
  }
 
  /**
   * Looks at all parents for a inline CSS Attr
   * 
   * @param   string  name of HTML ID
   * @param   string  name of HTML/CSS Attribute
   * @return  string  value of Attr
  */
  protected function parent_css_attr($selector, $attr)
  {
    $ele = $this->dom->getElementById($selector);
    while($ele = $ele->parentNode){
      if(is_a($ele,'DOMELEMENT') && ($result = $this->inline_css_attr($selector,$attr,$ele))) return $result;
    }
  }
 
  /**
   * Searches for an embedded style for the given ID
   *
   * @param   string  name of HTML ID
   * @param   string  name of HTML/CSS Attribute
   * @return  string  value of Attr
  */
  protected function embedded_css_attr($selector, $attr)
  {
    $css = $this->get_embedded_css();
    $css = $this->prep_css($css);
    
    return $this->css_array_id_value($css,$selector,$attr);
  }
 
  /**
   * Looks for attr in linked css files
   * 
   * @param   string  name of HTML ID
   * @param   string  name of HTML/CSS Attribute
   * @return  string  value of Attr
  */
  protected function linked_css_attr($selector, $attr)
  {
    $css = $this->get_linked_css();
    $css = explode("\n",$css);
    unset($css[count($css)-1]);
    $merged = '';
    if(!@file_get_contents($css[0])){
      //die('<strong>ERROR:</strong><br/>CSS finder isnt smart enough yet to find the location of the linked CSS files.');
        $domain = $this->find_url_root();
    }
    foreach($css as $each){
      if(isset($domain)){
        $merged.= file_get_contents($domain.'/'.$each)."\n"; 
      }else{
        $merged.= file_get_contents($each)."\n"; 
      }
    }
    $css = $this->prep_css($merged);
    return $this->css_array_id_value($css,$selector,$attr);
  }
 
  /**
   * Return Attr value from Css Array
   * 
   * @param   array   css array
   * @param   string  id value
   * @param   string  attr to get
   * @return  string  attr value
  */
  protected function css_array_id_value($css, $selector, $attr)
  {
    return isset($css['#'.$selector][$attr])? $css['#'.$selector][$attr] : NULL;
  }
 
  /**
   * Looks for linked CSS files
   * 
   * @return string names of CSS files
  */
  protected function get_linked_css()
  {
    $css = $this->dom->getElementsByTagName('link');
    $length = $css->length;
    
    $merged = '';
    for($i=0;$i<$length; $i++){
      $merged.= $css->item($i)->getAttribute('href') . "\n";
    }
    return $merged;
  }
 
  /**
   * Converts given CSS into an array
   *
   * @param   css     string  Css
   * @return  array   array of CSS
  */
  protected function prep_css($css)
  {
    $css_array = array();
    // Remove comments
    $css = preg_replace('/\/\*(.*)?\*\//','',$css); 
 
    $css = explode('}',$css); 
    // Remove empty
    unset($css[count($css)-1]);
    foreach($css as $ea){
      $param_array = array();
      $ea = explode('{',$ea); 
      $name = trim($ea[0]);
      $value = trim($ea[1]);
      $ea_atr = explode(';',$value);
      // Remove empty
      unset($ea_atr[count($ea_atr)-1]);
      if(empty($ea_atr)) continue;
      foreach($ea_atr as $parm){
        $param = explode(':',$parm);
        $param_array[trim($param[0])] = trim($param[1]);
      }
      // Cascade
      // If it exists merge with the newest value dominate.
      if(isset($css_array[$name]))
        $css_array[$name] = array_merge($css_array[$name],$param_array);
      else
        $css_array[$name] = $param_array;
    }
    return $css_array;
  }
 
  /**
   * Grabs all <style> in the document
   * 
   * @return  string  all of the styles merged together
  */
  protected function get_embedded_css()
  {
    $css = $this->dom->getElementsByTagName('style');
    $length = $css->length;
    
    $merged = '';
    for($i=0;$i<$length; $i++){
      $merged.= $css->item($i)->nodeValue . "\n";
    }
    return $merged;
  }
 
  /**
   * Finds the root of the Document
   * If linked styles dont have the relative path, we have to guess
   *
   * @return string   url relative path
  */
  protected function find_url_root()
  {
    $url = addslashes(urldecode($this->url));
    $url = explode('/',$url); 
    unset($url[count($url)-1]);
    return implode($url,'/');
  }
}
 
testcolor.php

Code: Select all

 
<?php
 
include('text_attr.php');
 
/*
$url = 'example5.html';
$selector = 'mytext';
*/
 
$url = addslashes($_GET['url']);
$selector = addslashes($_GET['id']);
 
$html = file_get_contents($url);
 
$text = new Text_Attr($html,$_GET['url']);
 
echo '<strong>URL:</strong> ' .$url;
echo "\n<br/>\n";
echo '<strong>ID:</strong> ' . $selector;
echo "\n<br/>\n<br/>\n";
 
var_dump($text->get_attr($selector, 'color'));
 
Examples found at : http://textcolor.zoxive.com/examples.html (I just made this CNAME a few hours ago, so it may not work yet for you..)

Few things..
My code is pretty sloppy and I took a few short cuts (explodes and such) which result into some inaccuracies down the line. I haven't really coded anything from scratch recently. Most of my recent work has been done inside a MVC framework environment.

I did not validate any of the data coming in, since this most likely will not be used in the way my examples are. Cascading Styles *kind of* work. As far as a new attribute under the same ID is added. It does not have the ability for multiple names/ids yet. ex: #myid, #someother { color:#FF0000; }. I believe i can add the functionality by having regex search threw my CSS array, or rewrite how It currently finds values in css.

I spent about 4 hours on this today 8O, maybe tomorrow I'll get around to messing with real cascading.

Edit: Found a typo in the code

Re: [Challenge - Advanced] What color is the text?

Posted: Fri Aug 28, 2009 2:34 am
by onion2k
Nice. Very nice.