[Challenge - Advanced] What color is the text?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

[Challenge - Advanced] What color is the text?

Post by onion2k »

Write a script that can take the address of a webpage and the id of an HTML element on that page, and returns the HTML hex color of the element.

This sounds fairly simple, but it really, really isn't. To be honest, it's a massive amount of work so if you're going to take part you might want to collaborate with other people.

Given the id "mytext" all of these should return "#FF0000":

Code: Select all

<font id="mytext" color="#FF0000">Text</font>

Code: Select all

<font color="#FF0000"><h1 id="mytext">Text</h1></font>

Code: Select all

<h1 id="mytext" style="color: #FF0000">Text</h1>

Code: Select all

<style>#mytext { color: #FF0000; }</style>
<div id="mytext">Text</div>
If you can make it cope with external styles as well that'd be cool. And if it could cope with cascading styles properly.. woah!
klevis miho
Forum Contributor
Posts: 413
Joined: Wed Oct 29, 2008 2:59 pm
Location: Albania
Contact:

Re: [Challenge - Advanced] What color is the text?

Post by klevis miho »

That sounds really simple, but i haven't got a clue :)
User avatar
jayshields
DevNet Resident
Posts: 1912
Joined: Mon Aug 22, 2005 12:11 pm
Location: Leeds/Manchester, England

Re: [Challenge - Advanced] What color is the text?

Post by jayshields »

Can you require that the requested page is well-formed, valid XML?
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: [Challenge - Advanced] What color is the text?

Post by onion2k »

jayshields wrote:Can you require that the requested page is well-formed, valid XML?
Sure, if you like. If you do then it might be nice for the script to give user a nice error message like "Can't parse the page" instead of returning nothing though.
User avatar
juma929
Forum Commoner
Posts: 72
Joined: Wed Jun 17, 2009 9:41 am

Re: [Challenge - Advanced] What color is the text?

Post by juma929 »

Hello,

:banghead: I have so much studying to do but now I cant keep my mind off building this! 8O.

Might have to give it a quick sneaky go, seems simple enough but as with most things, it's probably a lot more complicated than it seems at first glance.

Might have to discipline myself and not even start it :P
User avatar
jayshields
DevNet Resident
Posts: 1912
Joined: Mon Aug 22, 2005 12:11 pm
Location: Leeds/Manchester, England

Re: [Challenge - Advanced] What color is the text?

Post by jayshields »

If someone does this cheap 'n' dirty and grabs a snapshot of the page, finds the x and y co-ords of the element and scrapes the colour out I will give them one English pound. Would be much more fun than messing around with parsers too :)
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: [Challenge - Advanced] What color is the text?

Post by Weirdan »

jayshields wrote:finds the x and y co-ords of the element and scrapes the colour out
Finding coordinates is more complicated than finding color, and would require page parsing as well.
User avatar
Darhazer
DevNet Resident
Posts: 1011
Joined: Thu May 14, 2009 3:00 pm
Location: HellCity, Bulgaria

Re: [Challenge - Advanced] What color is the text?

Post by Darhazer »

Pretty interesting actually. I will try to do something over the weekend :)
User avatar
jayshields
DevNet Resident
Posts: 1912
Joined: Mon Aug 22, 2005 12:11 pm
Location: Leeds/Manchester, England

Re: [Challenge - Advanced] What color is the text?

Post by jayshields »

Weirdan wrote:
jayshields wrote:finds the x and y co-ords of the element and scrapes the colour out
Finding coordinates is more complicated than finding color, and would require page parsing as well.
Yeah exactly but it would probably end up being less complicated than parsing CSS and HTML files and getting it to work with inherited stuff in both HTML and CSS. Plus it would work with malformed mark-up.

It would require some of the logic that those live screenshot scripts/websites use.

Edit: I just thought of a relatively simple way to do this - does Lynx support different font colours? Or any text-based browser? The source of that will get you most of the way there.
User avatar
lorenzo-s
Forum Commoner
Posts: 43
Joined: Tue Aug 25, 2009 12:25 pm

Re: [Challenge - Advanced] What color is the text?

Post by lorenzo-s »

I think an outline can be:

1) search for the element (with regex find id="mytext")
2) search for style attribute, then for colour
3) if not found, search for #mytext in CSSs, then for colour
4) if not found, get class attribute for element, search for .classname in CSSs, then for colour
6) if not found, get element parent, and back to point 2) getting new ID
User avatar
jayshields
DevNet Resident
Posts: 1912
Joined: Mon Aug 22, 2005 12:11 pm
Location: Leeds/Manchester, England

Re: [Challenge - Advanced] What color is the text?

Post by jayshields »

To do it properly there's a almost end-less amount of possibilities as to where an element could get it's colour from. What about if JavaScript manipulates the colour of the text on page load depending on something like which browser you're using?
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: [Challenge - Advanced] What color is the text?

Post by onion2k »

jayshields wrote:To do it properly there's a almost end-less amount of possibilities as to where an element could get it's colour from. What about if JavaScript manipulates the colour of the text on page load depending on something like which browser you're using?
I think it's safe to ignore scripting. Everything else should be ok though.. you've really only got to examine the color attribute, style attribute, class and id, each recursively up the CSS or HTML tree. It's a lot of work, sure, but it's possible. And once you have that code in place it'd be easy to expand it out to examine other attributes at which point the code becomes extremely useful.
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Re: [Challenge - Advanced] What color is the text?

Post by Benjamin »

Here's how I would do it:

1. Parse the html file so that you can detect inline css and external style sheets.
2. Parse the css and use the results to populate a class.
3. Using this class (or classes) create methods that allow you to pull css color attributes for elements with specific classes or id's
4. Parse the html file. Starting with the html tag, retrieve the color attributes. These color attributes will cascade and be assigned to relevant child elements unless overridden. <font color="foo"> will be overridden by css. Inline css will override all.
5. When the element with the matching id is found, you'll have the color.

This is not as hard as it seems when broken down into pieces. The hardest part would be creating the css and html parsers. Keep in mind however, you only need to pay attention to color attributes.
User avatar
Zoxive
Forum Regular
Posts: 974
Joined: Fri Apr 01, 2005 4:37 pm
Location: Bay City, Michigan

My Entry

Post by Zoxive »

I was bored enough today to try this out.

text_attr.php

Code: Select all

 
<?php
/**
 *  Text_Attr Class
 * 
 * @author  Zoxive - kyle@zoxive.com
 * @date    27th August 2009
 */
class Text_Attr{
 
  protected $doc    = NULL;
  protected $dom    = NULL;
  protected $url    = NULL;
 
  /**
   * Constructs a new Text_Attr object.
   * 
   * @param   string  HTML document
   * @param   string  URL of such document, if not local
   * @return void
  */
  public function __construct($doc = NULL, $url = NULL)
  {
    $this->doc      = $doc;
    $this->dom      = DOMDocument::loadHTML($doc);
    $this->url      = $url;
  }
 
  /**
   * Locates the Attribute inside the document
   *  
   * @param   string  name of HTML ID
   * @param   string  name of HTML/CSS Attr
   * @return  string  value of Attr
  */
  public function get_attr($selector, $attr)
  {
    if(
    // Inline html
    ($Result = $this->inline_attr($selector,$attr)) ||
    // Inline CSS
    ($Result = $this->inline_css_attr($selector,$attr)) ||
    // Parent html
    ($Result = $this->parent_attr($selector,$attr)) ||
    // Parent CSS
    ($Result = $this->parent_css_attr($selector,$attr)) ||
    // Embedded CSS
    ($Result = $this->embedded_css_attr($selector,$attr)) ||
    // Linked CSS
    ($Result = $this->linked_css_attr($selector,$attr))
    )
      return $Result;
    else
      return NULL;
  }
 
  /**
   * Searches the local dom for the Attr
   * 
   * @param   string  name of HTML ID
   * @param   string  name of HTML/CSS Attr
   * @return  string  value of Attr
  */
  protected function inline_attr($selector, $attr)
  {
    //return pq($selector)->attr($attr);
    $ele = $this->dom->getElementById($selector);
    if(!$ele) return NULL;
    if($ele->hasAttribute($attr))
      return $ele->getAttribute($attr);
  }
 
  /**
   * Looks at inline CSS for the Attr
   * 
   * @param   string  name of HTML ID
   * @param   string  name of HTML/CSS Attr
   * @return  string  value of Attr
  */
  protected function inline_css_attr($selector, $attr, $ele = NULL)
  {
    if(empty($ele))
      $ele = $this->dom->getElementById($selector);
    if(empty($ele)) return NULL;
    if($ele->hasAttribute('style')){
      $style = $ele->getAttribute('style');
      $style = split(':',$style);
 
      if(strtolower($attr) == strtolower($style[0]))
        return trim($style[1]);
    }
  }
 
  /**
   * Looks at all the parents for the attr
   * 
   * @param   string  name of HTML ID
   * @param   string  name of HTML/CSS Attribute
   * @return  string  value of Attr
  */
  protected function parent_attr($selector, $attr)
  {
    $ele = $this->dom->getElementById($selector);
    while($ele = $ele->parentNode){
      if(is_a($ele,'DOMELEMENT') && ($result = $ele->getAttribute($attr))) return $result;
    }
  }
 
  /**
   * Looks at all parents for a inline CSS Attr
   * 
   * @param   string  name of HTML ID
   * @param   string  name of HTML/CSS Attribute
   * @return  string  value of Attr
  */
  protected function parent_css_attr($selector, $attr)
  {
    $ele = $this->dom->getElementById($selector);
    while($ele = $ele->parentNode){
      if(is_a($ele,'DOMELEMENT') && ($result = $this->inline_css_attr($selector,$attr,$ele))) return $result;
    }
  }
 
  /**
   * Searches for an embedded style for the given ID
   *
   * @param   string  name of HTML ID
   * @param   string  name of HTML/CSS Attribute
   * @return  string  value of Attr
  */
  protected function embedded_css_attr($selector, $attr)
  {
    $css = $this->get_embedded_css();
    $css = $this->prep_css($css);
    
    return $this->css_array_id_value($css,$selector,$attr);
  }
 
  /**
   * Looks for attr in linked css files
   * 
   * @param   string  name of HTML ID
   * @param   string  name of HTML/CSS Attribute
   * @return  string  value of Attr
  */
  protected function linked_css_attr($selector, $attr)
  {
    $css = $this->get_linked_css();
    $css = explode("\n",$css);
    unset($css[count($css)-1]);
    $merged = '';
    if(!@file_get_contents($css[0])){
      //die('<strong>ERROR:</strong><br/>CSS finder isnt smart enough yet to find the location of the linked CSS files.');
        $domain = $this->find_url_root();
    }
    foreach($css as $each){
      if(isset($domain)){
        $merged.= file_get_contents($domain.'/'.$each)."\n"; 
      }else{
        $merged.= file_get_contents($each)."\n"; 
      }
    }
    $css = $this->prep_css($merged);
    return $this->css_array_id_value($css,$selector,$attr);
  }
 
  /**
   * Return Attr value from Css Array
   * 
   * @param   array   css array
   * @param   string  id value
   * @param   string  attr to get
   * @return  string  attr value
  */
  protected function css_array_id_value($css, $selector, $attr)
  {
    return isset($css['#'.$selector][$attr])? $css['#'.$selector][$attr] : NULL;
  }
 
  /**
   * Looks for linked CSS files
   * 
   * @return string names of CSS files
  */
  protected function get_linked_css()
  {
    $css = $this->dom->getElementsByTagName('link');
    $length = $css->length;
    
    $merged = '';
    for($i=0;$i<$length; $i++){
      $merged.= $css->item($i)->getAttribute('href') . "\n";
    }
    return $merged;
  }
 
  /**
   * Converts given CSS into an array
   *
   * @param   css     string  Css
   * @return  array   array of CSS
  */
  protected function prep_css($css)
  {
    $css_array = array();
    // Remove comments
    $css = preg_replace('/\/\*(.*)?\*\//','',$css); 
 
    $css = explode('}',$css); 
    // Remove empty
    unset($css[count($css)-1]);
    foreach($css as $ea){
      $param_array = array();
      $ea = explode('{',$ea); 
      $name = trim($ea[0]);
      $value = trim($ea[1]);
      $ea_atr = explode(';',$value);
      // Remove empty
      unset($ea_atr[count($ea_atr)-1]);
      if(empty($ea_atr)) continue;
      foreach($ea_atr as $parm){
        $param = explode(':',$parm);
        $param_array[trim($param[0])] = trim($param[1]);
      }
      // Cascade
      // If it exists merge with the newest value dominate.
      if(isset($css_array[$name]))
        $css_array[$name] = array_merge($css_array[$name],$param_array);
      else
        $css_array[$name] = $param_array;
    }
    return $css_array;
  }
 
  /**
   * Grabs all <style> in the document
   * 
   * @return  string  all of the styles merged together
  */
  protected function get_embedded_css()
  {
    $css = $this->dom->getElementsByTagName('style');
    $length = $css->length;
    
    $merged = '';
    for($i=0;$i<$length; $i++){
      $merged.= $css->item($i)->nodeValue . "\n";
    }
    return $merged;
  }
 
  /**
   * Finds the root of the Document
   * If linked styles dont have the relative path, we have to guess
   *
   * @return string   url relative path
  */
  protected function find_url_root()
  {
    $url = addslashes(urldecode($this->url));
    $url = explode('/',$url); 
    unset($url[count($url)-1]);
    return implode($url,'/');
  }
}
 
testcolor.php

Code: Select all

 
<?php
 
include('text_attr.php');
 
/*
$url = 'example5.html';
$selector = 'mytext';
*/
 
$url = addslashes($_GET['url']);
$selector = addslashes($_GET['id']);
 
$html = file_get_contents($url);
 
$text = new Text_Attr($html,$_GET['url']);
 
echo '<strong>URL:</strong> ' .$url;
echo "\n<br/>\n";
echo '<strong>ID:</strong> ' . $selector;
echo "\n<br/>\n<br/>\n";
 
var_dump($text->get_attr($selector, 'color'));
 
Examples found at : http://textcolor.zoxive.com/examples.html (I just made this CNAME a few hours ago, so it may not work yet for you..)

Few things..
My code is pretty sloppy and I took a few short cuts (explodes and such) which result into some inaccuracies down the line. I haven't really coded anything from scratch recently. Most of my recent work has been done inside a MVC framework environment.

I did not validate any of the data coming in, since this most likely will not be used in the way my examples are. Cascading Styles *kind of* work. As far as a new attribute under the same ID is added. It does not have the ability for multiple names/ids yet. ex: #myid, #someother { color:#FF0000; }. I believe i can add the functionality by having regex search threw my CSS array, or rewrite how It currently finds values in css.

I spent about 4 hours on this today 8O, maybe tomorrow I'll get around to messing with real cascading.

Edit: Found a typo in the code
Last edited by Zoxive on Fri Aug 28, 2009 1:47 pm, edited 1 time in total.
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: [Challenge - Advanced] What color is the text?

Post by onion2k »

Nice. Very nice.
Post Reply