Almost all sites use images with text that user has to retype to prove that he's a human not spambots. A lot of those images (called captcha) contains small lines, dots and other noises to make theirs analyze more difficult for spambots. In this post I will present how to easily remove this noise from a captcha.
I used Python and its PIL library for processing captchas. The first step is to transform image to grey-scale (this makes further work easier) and blur it (it makes small objects less visible). PIL's blur filter is a bit poor for that, but SMOTH filter works great (but it needs to be applied twice made it blurred enough). Next step is to check all pixels if their value is higher that some certain constant, if yes, then pixels will become white, if not they will be black. This constant may be set from command line.
Below is example how to use script (first argument is path to image, second is mentioned constant), some samples of usage and source code.
rgawron@vk1004:~/noiseReduction$ python noise.py captcha2.jpg 90pass_factor = 100 pass_factor = 90 pass_factor = 100 pass_factor = 130
import sys import Image import ImageFilter def prepare_image(img): """Transform image to greyscale and blur it""" img = img.filter(ImageFilter.SMOOTH_MORE) img = img.filter(ImageFilter.SMOOTH_MORE) if 'L' != img.mode: img = img.convert('L') return img def remove_noise(img, pass_factor): for column in range(img.size[0]): for line in range(img.size[1]): value = remove_noise_by_pixel(img, column, line, pass_factor) img.putpixel((column, line), value) return img def remove_noise_by_pixel(img, column, line, pass_factor): if img.getpixel((column, line)) < pass_factor: return (0) return (255) if __name__=="__main__": input_image = sys.argv[1] output_image = 'out_' + input_image pass_factor = int(sys.argv[2]) img = Image.open(input_image) img = prepare_image(img) img = remove_noise(img, pass_factor) img.save(output_image)
it looks really good, but for each type of captcha you need to tweak pass_factor parameter. it would be good to calculate it automatically. also impressively small code ;)
ReplyDeleteYes, it's a good idea!
DeleteAmazing stuff...
ReplyDeleteThanks!
DeleteI cannot remove the line noise because all line noise is 255.Any way to solve it ?
ReplyDeleteHi,
Deletetry to run this line more times:
img = img.filter(ImageFilter.SMOOTH_MORE)
-Bob
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteI am not able to remove the horizontal line from my captcha.
ReplyDeletePlease provide your email so that i can share you my samples...it would be great if you can help me with this because my images have similar type of horizontal lines on all images
DeleteWhen I run this code the captcha comes out completely blank. How do I fix this?
ReplyDeleteChange the numeric argument passed from command line (python noise.py captcha2.jpg 90 <--- this one)
Delete