Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Top Posters

Who's Online (2)

Powered by Vanilla. Made with Bootstrap.
Progenic CAPTCHA solver
  • sangf
    Posts: 203
    here's a short something i wrote over the last couple of days after seeing how Progenic has quite a weak CAPTCHA system. this code can return the text successfully at a decent rate (not 100%, but i'll get to that). this script has a few dependencies so it's not ideal for a disributed (a.k.a. botting) solution. it uses the following projects:

    Python Imaging Library (PIL)
    1.1.7 - http://www.pythonware.com/products/pil/
    this can be installed on debian based systems with the package python-imaging.

    Tesseract OCR
    http://code.google.com/p/tesseract-ocr/
    a really decent open source OCR tool. i used this as the engine to parse the text from a cleaned image - this does the hard stuff, basically.

    Python-tesseract
    https://github.com/hoffstaetter/python-tesseract
    a wrapper for the above, in Python. this doesn't provide a C interface using ported tesseract libraries, unfortunately. however, it's a good alternative, it just means having an external dependency (even if disributed with something like Py2Exe).

    Source
    note: Python-tesseract not included until i can be bothered to make a zip.

    #! /usr/bin/env python
    #
    # Progenic CAPTCHA solver by fgnas
    # Developed under Python 2.7.1
    #
    # Requires:
    # PIL 1.1.7 - http://www.pythonware.com/products/pil/ (python-imaging)
    # Tesseract OCR - http://code.google.com/p/tesseract-ocr/
    #
    # Included in this distribution:
    # Python-tesseract - https://github.com/hoffstaetter/python-tesseract
    #
    # File: progenic_captcha.py

    import os
    import tesseract
    import Image

    # Main API management interface to solve basic a CAPTCHA
    class captcha_manager:
    _captcha_data_handle = None
    _captcha_file_handle = None
    _captcha_temp_file = './__tmp_progenic_captcha__.png'

    # Constructor: optional filename to open image automagically
    def __init__(self, captcha_file = None):
    if captcha_file:
    self.open(captcha_file)

    # Open image file, setup image data ready to be manipulated
    def open(self, captcha_file):
    if not os.path.exists(captcha_file):
    raise IOError('File not found')
    else:
    if os.path.exists(self._captcha_temp_file):
    try:
    os.remove(self._captcha_temp_file)
    except Exception:
    raise IOError('Temporary file could not be removed')

    self._captcha_file_handle = Image.open(captcha_file)
    if self._captcha_file_handle.mode != 'RGB':
    self._captcha_file_handle = self._captcha_file_handle.convert('RGB')
    self._captcha_data_handle = self._captcha_file_handle.load()

    # Read CAPTCHA text, return as string (empty string on fail)
    def read(self, explicit_save = False):
    cleaner = captcha_cleaner()
    cleaner.set_data(self._captcha_data_handle, self._captcha_file_handle.size)
    cleaner.clean(int('0x09', 0) + int('0x09', 0) + int('0x09', 0), int('0x52', 0) + int('0x52', 0) + int('0x52', 0))
    crop_extents = cleaner.get_crop_extents(5)
    cleaner.destroy()
    self._captcha_file_handle = self._captcha_file_handle.crop(crop_extents)
    if explicit_save: self._captcha_file_handle.save(self._captcha_temp_file, 'PNG')
    return tesseract.image_to_string(self._captcha_file_handle)

    # Destructor: explicit object deletion
    def destroy(self):
    del self._captcha_data_handle
    del self._captcha_file_handle

    # Image cleaner class, provides methods to make the CAPTCHA OCR friendly
    class captcha_cleaner:
    _captcha_x = None
    _captcha_y = None
    _captcha_data_handle = None
    _captcha_metadata = [None, None]

    # Constructor:
    def __init__(self):
    pass

    # Gets the extents for the bounding box around text from pixels which are known to be text
    def _add_metadata(self, x, y):
    meta = self._captcha_metadata
    if not meta[0]:
    meta[0] = [x, y]
    else:
    if x < meta[0][0]:
    meta[0][0] = x
    if y < meta[0][1]:
    meta[0][1] = y
    if not meta[1]:
    meta[1] = [x, y]
    else:
    if x > meta[1][0]:
    meta[1][0] = x
    if y > meta[1][1]:
    meta[1][1] = y
    self._captcha_metadata = meta

    # Sets up the cleaner object with the input data it needs before processing
    def set_data(self, captcha_data, dimensions):
    if len(dimensions) == 2:
    self._captcha_data_handle = captcha_data
    self._captcha_x = dimensions[0]
    self._captcha_y = dimensions[1]
    else: raise ValueError('Invalid image dimensions')

    # Core cleaning method, attempts to remove background noise and leave clean text on a white background
    def clean(self, text_threshold, edge_threshold):
    if (not self._captcha_data_handle) or (not self._captcha_x) or (not self._captcha_y):
    raise ValueError('No image data to clean')
    x = y = 0
    while (x < self._captcha_x) and (y != self._captcha_y):
    if x == self._captcha_x: x = 0
    pixel_data = self._captcha_data_handle[x, y]
    pixel_colour = pixel_data[0] + pixel_data[1] + pixel_data[2]
    if pixel_colour > edge_threshold:
    pixel_data = (255, 255, 255)
    elif pixel_colour > text_threshold:
    edge_detected = False
    for _x, _y in ((x - 1, y + 1), (x - 1, y - 1), (x - 1, y), (x, y - 1), (x, y + 1), (x + 1, y + 1), (x + 1, y - 1), (x + 1, y)):
    if (_x >= self._captcha_x) or (_x <= 0) or (_y >= self._captcha_y) or (_y <= 0):
    continue
    _pixel_data = self._captcha_data_handle[_x, _y]
    _pixel_colour = _pixel_data[0] + (_pixel_data[1] * 256) + (_pixel_data[2] * 65536)
    if _pixel_colour < text_threshold:
    edge_detected = True
    break
    if not edge_detected: pixel_data = (255, 255, 255)
    else:
    pixel_data = (0, 0, 0)
    self._add_metadata(x, y)
    self._captcha_data_handle[x, y] = pixel_data
    y += 1
    if (y == self._captcha_y) and (x + 1 != self._captcha_x):
    y = 0
    x += 1

    # Returns crop extents with padding using data retrieved by _add_metadata()
    def get_crop_extents(self, padding = 5):
    x1, y1 = self._captcha_metadata[0]
    x2, y2 = self._captcha_metadata[1]
    if x1 - padding < 0: x1 = 0
    else: x1 -= padding
    if y1 - padding < 0: y1 = 0
    else: y1 -= padding
    if x2 + padding >= self._captcha_x: x2 = self._captcha_x - 1
    else: x2 += padding
    if y2 + padding >= self._captcha_y: y2 = self._captcha_y - 1
    else: y2 += padding
    return (x1, y1, x2, y2)

    # Destructor: explicit object deletion
    def destroy(self):
    del self._captcha_data_handle
    del self._captcha_metadata
    del self._captcha_x
    del self._captcha_y

    if __name__ == '__main__':
    captcha = captcha_manager()
    captcha.open('progenic_captcha.jpg')
    print(captcha.read(True))
    captcha.destroy()


    or view in its highlighted beauty: http://codepad.org/kgGrvwwP

    How to use:
    this can be used in external Python scripts quite easily, assuming all dependencies are accounted for, and that 'progenic_captcha.py' and 'tesseract.py' can be found by Python.

    import progenic_captcha.py
    captcha = captcha_manager()
    captcha.open('CAPTCHA_Image.jpg')
    print(captcha.read())
    captcha.destroy()


    Example results:
    here's the test data i used, and also the results of the script output. as you can see, it isn't entirely acurate.. but it does seem to be sufficient, and i'm pleased with the results. however, i didn't test it with enough data to provide any real statistics, so make of this what you will.

    [spoiler]http://i.imgur.com/OMQ3o.jpg
    RLTG

    http://i.imgur.com/R9Ucw.jpg
    BFVF

    http://i.imgur.com/E7lJd.jpg
    DTGLI

    http://i.imgur.com/lrNQe.jpg
    QGHQ

    http://i.imgur.com/hU5HA.jpg
    swfÔÇØ

    http://i.imgur.com/RFLkJ.jpg
    JCJB

    http://i.imgur.com/gQGba.jpg
    THFC

    Example of a cleaned CAPTCHA image, and it's a good job i posted this; i just noticed a typo in the padding code when i was curious about why the bottom wasn't padded by 5px.
    http://i.imgur.com/cfJNb.png

    [spoiler]
    http://images.wikia.com/touhou/images/e/e1/Yukkuri_MarisaReimu.png
    [/spoiler]
    [/spoiler]


    power to the spaghetti D:
  • Xin
    Posts: 3,251
    That is elite :) good job bro, maybe we can build in proxy support and beat this thing ;)
    Xin
  • +1 Looking really good ...Maybe could be extended when you get it accurate enough for the captcha project...
  • Sh3llc0d3
    Posts: 1,910
    Nice job sangf :)
  • sangf
    Posts: 203
    thanks~~ if anyone knows the name of this CAPTCHA system, let me know! i'm sure i've seen these types elsewhere, and it'd certainly make for a more fitting name.