Making captcha easier to break

Almost all sites use images with text that user has to retype to prove that he's a human not spambots. A lot of those images (called captcha) contains small lines, dots and other noises to make theirs analyze more difficult for spambots. In this post I will present how to easily remove this noise from a captcha.

Breaking Captcha in Python Breaking Captcha in Python

I used Python and its PIL library for processing captchas. The first step is to transform image to grey-scale (this makes further work easier) and blur it (it makes small objects less visible). PIL's blur filter is a bit poor for that, but SMOTH filter works great (but it needs to be applied twice made it blurred enough). Next step is to check all pixels if their value is higher that some certain constant, if yes, then pixels will become white, if not they will be black. This constant may be set from command line.

Below is example how to use script (first argument is path to image, second is mentioned constant), some samples of usage and source code.

rgawron@vk1004:~/noiseReduction$ python noise.py captcha2.jpg 90
pass_factor = 100 Breaking Captcha in Python Breaking Captcha in Python pass_factor = 90 Breaking Captcha in Python Breaking Captcha in Python pass_factor = 100 Breaking Captcha in Python Breaking Captcha in Python pass_factor = 130 Breaking Captcha in Python Breaking Captcha in Python
import sys
import Image
import ImageFilter

def prepare_image(img):
    """Transform image to greyscale and blur it"""
    img = img.filter(ImageFilter.SMOOTH_MORE)
    img = img.filter(ImageFilter.SMOOTH_MORE)
    if 'L' != img.mode:
        img = img.convert('L')
    return img

def remove_noise(img, pass_factor):
    for column in range(img.size[0]):
        for line in range(img.size[1]):
            value = remove_noise_by_pixel(img, column, line, pass_factor)
            img.putpixel((column, line), value)
    return img

def remove_noise_by_pixel(img, column, line, pass_factor):
    if img.getpixel((column, line)) < pass_factor:
        return (0)
    return (255)


if __name__=="__main__":
    input_image = sys.argv[1]
    output_image = 'out_' + input_image
    pass_factor = int(sys.argv[2])

    img = Image.open(input_image)
    img = prepare_image(img)
    img = remove_noise(img, pass_factor)
    img.save(output_image)

12 comments:

  1. it looks really good, but for each type of captcha you need to tweak pass_factor parameter. it would be good to calculate it automatically. also impressively small code ;)

    ReplyDelete
  2. I cannot remove the line noise because all line noise is 255.Any way to solve it ?

    ReplyDelete
    Replies
    1. Hi,

      try to run this line more times:
      img = img.filter(ImageFilter.SMOOTH_MORE)

      -Bob

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. I am not able to remove the horizontal line from my captcha.

    ReplyDelete
    Replies
    1. Please provide your email so that i can share you my samples...it would be great if you can help me with this because my images have similar type of horizontal lines on all images

      Delete
  6. When I run this code the captcha comes out completely blank. How do I fix this?

    ReplyDelete
    Replies
    1. Change the numeric argument passed from command line (python noise.py captcha2.jpg 90 <--- this one)

      Delete