Usually CAPTCHAs are analyzed by using neural network, it's a good approach, but it may be overcomplicated in simple cases. Presented below, much shorter algorithm can produce sufficient results for uncomplicated CAPTCHAs.
In this algorithm an image with unknown letter is compared with samples of known letters, the letter in the most similar sample is probably also the letter in analyzed image. It was implemented as a Python script, usage presented below:
bash-3.2$ python cracker.py test1.png e
bash-3.2$ python cracker.py test2.png p
It can't be directly used on a raw CAPTCHA, firstly small artifacts have to be removed from the CAPTCHA, secondly each letter should be stored in a separate image.
The script below requires samples directory with samples of letters. A sample set and this CAPTCHA breaker can be downloaded from my GitHub (CaptachaCracker directory).
import sys, os import math import string import Image import PIL.ImageChops if __name__=="__main__": input = sys.argv base = Image.open(input).convert('L') class Fit: letter = None difference = 0 best = Fit() for letter in string.lowercase: current = Fit() current.letter = letter sample_path = os.path.join('samples', letter + '.png') sample = Image.open(sample_path).convert('L').resize(base.size) difference = PIL.ImageChops.difference(base, sample) for x in range(difference.size): for y in range(difference.size): current.difference += difference.getpixel((x, y)) if not best.letter or best.difference > current.difference: best = current print best.letter
I was surprised that this task can be done in less than 50 lines! Of course it's not good enough to break complicated CAPTCHAs, but they also aren't easy task for more complicated algorithms.