1 / 61

“ A Low-cost Attack on a Microsoft CAPTCHA”

“ A Low-cost Attack on a Microsoft CAPTCHA”. The annual ACM Computer and Communications Security Conference (2008). By Jeff Yan and Ahmad Salah El Ahmad. Presentation by Kathleen Stoeckle. Outline. Overview on CAPTCHA Related Work The MSN CAPTCHA Microsoft CAPTCHA Segmentation Attack

july
Download Presentation

“ A Low-cost Attack on a Microsoft CAPTCHA”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “A Low-cost Attack on a Microsoft CAPTCHA” The annual ACM Computer and Communications Security Conference (2008) By Jeff Yan and Ahmad Salah El Ahmad Presentation by Kathleen Stoeckle

  2. Outline • Overview on CAPTCHA • Related Work • The MSN CAPTCHA • Microsoft CAPTCHA Segmentation Attack • Results & Analysis • Strengths and Weaknesses

  3. Overview of CAPTCHA

  4. CAPTCHA • Completely Automated Public Turing Test to tell Computers and Humans Apart • Primitive CAPTCHAs developed in 1997 by Andrei Broder, Martin Abadi, Krishna Bharat, and Mark Lillibridge. • Luis von Ahn and Michael Blum coined the term “CAPTCHA” and improved the method in 2000. http://www.searchenginepeople.com/blog/page/16http://www.searchenginepeople.com/blog/page/16

  5. Google CAPTCHA http://www.codinghorror.com/blog/images/google-error-were-sorry-rate-limiter-captcha.png

  6. Yahoo CAPTCHA http://www.carolisayazakuser.com/helppage/1aghostloginauthbox1a.gif

  7. MSN CAPTCHA http://www.msn.com

  8. Text-Based CAPTCHAs • Most widely used CAPTCHA • Distort text images in order to make them unrecognizable to pattern-recognition programs. • Popular because: • Task is intuitive (character recognition) • Few localization issues (roman characters easily recognized) • Strong security potential

  9. Good CAPTCHAS • Robust • Human Friendly

  10. Character Recognition • Computers excel at recognizing characters, even when the characters are distorted. • If the positions of characters are known in CAPTCHAs then breaking the scheme is a matter of pattern recognition. • When the positions are not known then the computer programs have difficulty solving them.

  11. Recognition rate for individual characters under different distortions

  12. Segmentation • Segmentation – Identifying characters in the right order. • Challenging for both handwriting recognition and computer vision. • Traditionally, both computationally expensive and difficult when taking into account all characters in the challenge.

  13. “State of the Art” CAPTCHAs • Robustness of CAPTCHA must rely on segmentation rather than recognition. • If a text-based CAPTCHa is reduced to the challenge of recognizing individual characters then the scheme is effectively broken.

  14. Purpose of Paper Yan and El Ahmed’s paper examines the security of the Microsoft CAPTCHA. • This scheme was designed by an interdisciplinary Microsoft team. The principle “segmentation resistance” was established by this team. • By attacking this CAPTCHA, the authors goal is to determine how the MSN scheme and similar CAPTCHAs can be improved. • The paper shows how the Microsoft CAPTCHA was broken with a desktop computer with a 1.86 GHz Intel Core 2 CPU and 2 GB RAM using their algorithms.

  15. Related Work

  16. E-Z Gimpy and Gimpy Broken by Mori and Malik: • E-Z Gimpy (92% success) • Gimpy (33% success) • Object-recognition algorithms http://www.cs.sfu.ca/~mori/research/gimpy

  17. E-Z Gimpy and Gimpy Broken by Moy et al: • E-Z Gimpy (99% success) • 4-letter Gimpy-r (78% success) • Used distortion elimination techniques http://www.cs.sfu.ca/~mori/research/gimpy

  18. Other Work… • Chellapilla and Simard attacked visual CAPTCHAs (4.89% to 66.2% success) • Yan and El Ahmad defeated CAPTCHAs generated on Captchaservice.org (Almost 100% success). • Accomplished by counting pixels of segmented characters. • Examined robustness from security angle. • Simple pattern-recognition analysis. • PWNtcha – awebsite that demonstrates CAPTCHAs weakness and inefficiencies. Broke visual CAPTCHas (49 to 100% success)

  19. The MSN CAPTCHA

  20. The MSN CAPTCHA Challenge • Each challenge consists of 8 characters. • Only upper case letters and digits are used. • Text is dark blue and background is light • gray. • Warping is used to distort characters. • Random arcs of different thickness are • used in the anti-segmentation • measure.

  21. Warping • Local • Small ripples, waves and elastic deformations along the pixels of the character. • Global • Character-level, elastic deformations to foil template matching algorithms.

  22. Warping, cont’d Local Global

  23. Random Arcs • Thick Foreground Arcs • Same color as characters • As thick as the characters • Non-intersecting • Thin Foreground Arcs • Same color as characters • As thick as the thinnest parts of characters • Intersecting • Thin Background Arcs • Thin • Same color as background • Cut through characters

  24. Low-Cost Segmentation Attack On Microsoft CAPTCHA

  25. Low-Cost Segmentation Attack • Goal: Segment Microsoft CAPTCHA challenges. • Identify and remove random arcs • Identify all character locations in the right order. • Accomplishes this by: • Dividing each challenge into 8 ordered segments.

  26. Low-Cost Segmentation Attack • Goal: Segment Microsoft CAPTCHA challenges. • Identify and remove random arcs • Identify all character locations in the right order. • Accomplishes this by: • Dividing each challenge into 8 ordered segments.

  27. Low-Cost Segmentation Attack • Goal: Segment Microsoft CAPTCHA challenges. • Identify and remove random arcs • Identify all character locations in the right order. • Accomplishes this by: • Dividing each challenge into 8 ordered segments.

  28. Low-Cost Segmentation Attack • Goal: Segment Microsoft CAPTCHA challenges. • Identify and remove random arcs • Identify all character locations in the right order. • Accomplishes this by: • Dividing each challenge into 8 ordered segments.

  29. Attack in 7 Steps • Binarization • Fixing Broken Characters • Vertical Segmentation • Color Filling Segmentation • Thick Arc Removal • Locating Connected Characters • Segment Connected Characters

  30. Step 1: Binarization • Convert a color challenge to a two-color image using threshhold method. • High intensity  White • Low intensity  Black

  31. Step 2: Fixing Broken Characters • Keep character as a single entity. • Prevent small portions of characters from being removed by an arc.

  32. Step 2 • Find background color pixels that have left and right neighbors with foreground color • Find background color pixels that have top and bottom neighbors with foreground color. • Convert pixels identified above to foreground color.

  33. Step 2

  34. Step 2

  35. Step 3: Vertical Segmentation Segmentation method – Divide challenge vertically into chunks. Divide and Conquer

  36. Step 4: Color Filling Segmentation (CFS) • CFS applied to each chunk (Step 3) • Find every connected component or “object” in each chunk.

  37. Step 4 • CFS Algorithm: • Detect foreground pixel and trace it to all connecting pixels. This creates an object. • Locate foreground pixel outside of the object and traces connecting pixels to identify the next object. • Process essentially amounts to color filling each object. The number of colors used = the number of objects.

  38. Step 4 8 connectivity - Each pixel has 8 neighbors

  39. Step 4 A color fill is applied to each chunk, regardless of number of objects in the chunk.

  40. Step 5: Thick Arc Removal Thick Arc Characteristics: • Pixel Count – Generally small • Location – Close to or intersecting with image border. • Shape – Thick arcs do not contain circles. No characters such as A, B, D, P, Q, 4, 6, 8 and 9. • Interplay between shape and location – Correlation between thick arcs and geometric shape: • Tall but Narrow near start of CAPTCHA • Wide and Short near middle

  41. Step 5 Thick Arc Removal Algorithm • Circle Detection • Scan objects without circles for distinct features. • Relative Position Checking • Detection of Remaining Arcs

  42. Step 5-1: Circle Detection • Draws bounding box around an object. • Fill box with a color that is different from foreground and background. • Scan for pixels with background color. If found, a circle has been detected.

  43. Step 5-2: Scan non-circle objects for distinctive features Pixel Checking • Characters generally have pixel count of over 50. • Any characters 50 pixels or less is removed as an arc.

  44. Step 5-3: Relative Position Checking This step is applied to all chunks with more than one object. Premise: The positions of objects determines the difference between arcs and characters. Characters are always closer to the baseline. Characters are horizontally juxtaposed, but never vertically.

  45. Step 5-4: Detection of Remaining Arcs • Count the number of remaining arcs in the image. • Remaining arcs are generally the first and last object in the current image. • Check first and last objects using these rules: 1) If one object contains circle, the other is removed. 2) If neither object contains a circle, the one with the fewer number of pixels is removed.

  46. Step 6: Locating Connected Characters n = number of objects in an image If n< 8, at least one object has two or more connected characters. MSN Challenge 1. 8 characters in an image 2. Connected characters are connected horizontally not vertically and thus are wider. 3. A segmented chunk contains more than one character if the chunk is wider than 35 pixels. The number of chunks, width of chunks, and number of objects in a chunk are used to guess which chunks contain connected characters.

  47. Step 6: Locating Connected Characters

  48. Step 7: Segment Connected Characters • Find the width of an object by determining its left and rightmost pixels. • Vertically divide object into c parts of same width where c = number of characters.

  49. Results & Analysis

  50. Results of 7 Step Attack Success Rate: 91% (91 out of 100 challenges) 92% of 500 random challenges Attack Speed: Implemented in java 1.86 Ghz Intel Core CPU and 2 GB Ram

More Related