Usually CAPTCHAs are analyzed by using neural network, it's a good approach, but it may be overcomplicated in simple cases. Presented below, much shorter algorithm can produce sufficient results for uncomplicated CAPTCHAs.
In this algorithm an image with unknown letter is compared with samples of known letters, the letter in the most similar sample is probably also the letter in analyzed image. It was implemented as a Python script, usage presented below:
bash-3.2$ python cracker.py test1.png e
bash-3.2$ python cracker.py test2.png p
It can't be directly used on a raw CAPTCHA, firstly small artifacts have to be removed from the CAPTCHA, secondly each letter should be stored in a separate image.
The script below requires samples directory with samples of letters. A sample set and this CAPTCHA breaker can be downloaded from my GitHub (CaptachaCracker directory).
import sys, os import math import string import Image import PIL.ImageChops if __name__=="__main__": input = sys.argv[1] base = Image.open(input).convert('L') class Fit: letter = None difference = 0 best = Fit() for letter in string.lowercase: current = Fit() current.letter = letter sample_path = os.path.join('samples', letter + '.png') sample = Image.open(sample_path).convert('L').resize(base.size) difference = PIL.ImageChops.difference(base, sample) for x in range(difference.size[0]): for y in range(difference.size[1]): current.difference += difference.getpixel((x, y)) if not best.letter or best.difference > current.difference: best = current print best.letter
I was surprised that this task can be done in less than 50 lines! Of course it's not good enough to break complicated CAPTCHAs, but they also aren't easy task for more complicated algorithms.
wow!!!
ReplyDeletethe great blog.the post is very informative and very useful.
keep blogging.
image decoding
OCR software can be used as well. The problem with recognition of the letters is that there isn't a good way to recognise cambered letters (popular "fish eye" effects in Google CAPTCHA or Open Captcha).
ReplyDeleteIMHO OCR is overkill in most of the captchas, in addition, it will be probably too slow.
Delete@Anonymous, do you know open source libraries that would be good for this?
DeleteI heards about a project where DSL (in Hasjkell) was used to program route of a satelite.
Delete*Haskell
Delete@spainman, sounds cool! I will try to find more info about this project :)
DeleteHi, where is the sample directory at GitHub?
ReplyDeleteHi Alair,
Deleteit's here:
https://github.com/RobertGawron/snippets/tree/master/CaptachaCracker/samples
- Robert
You can use this captcha solver service for better captcha type support https://www.captchasolutions.com/
ReplyDeleteHi,
Deleteyes, thanks for the link, for commercional use it would be much more suitable.
Bob