a low cost attack on a microsoft captcha n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
“ A Low-cost Attack on a Microsoft CAPTCHA” PowerPoint Presentation
Download Presentation
“ A Low-cost Attack on a Microsoft CAPTCHA”

Loading in 2 Seconds...

play fullscreen
1 / 61

“ A Low-cost Attack on a Microsoft CAPTCHA” - PowerPoint PPT Presentation


  • 133 Views
  • Uploaded on

“ A Low-cost Attack on a Microsoft CAPTCHA”. The annual ACM Computer and Communications Security Conference (2008). By Jeff Yan and Ahmad Salah El Ahmad. Presentation by Kathleen Stoeckle. Outline. Overview on CAPTCHA Related Work The MSN CAPTCHA Microsoft CAPTCHA Segmentation Attack

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '“ A Low-cost Attack on a Microsoft CAPTCHA”' - july


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a low cost attack on a microsoft captcha

“A Low-cost Attack on a Microsoft CAPTCHA”

The annual ACM Computer and Communications Security Conference (2008)

By Jeff Yan and Ahmad Salah El Ahmad

Presentation by Kathleen Stoeckle

outline
Outline
  • Overview on CAPTCHA
  • Related Work
  • The MSN CAPTCHA
  • Microsoft CAPTCHA Segmentation Attack
  • Results & Analysis
  • Strengths and Weaknesses
slide4

CAPTCHA

  • Completely Automated Public Turing Test to tell Computers and Humans Apart
  • Primitive CAPTCHAs developed in 1997 by Andrei Broder, Martin Abadi, Krishna Bharat, and Mark Lillibridge.
  • Luis von Ahn and Michael Blum coined the term “CAPTCHA” and improved the method in 2000.

http://www.searchenginepeople.com/blog/page/16http://www.searchenginepeople.com/blog/page/16

google captcha
Google CAPTCHA

http://www.codinghorror.com/blog/images/google-error-were-sorry-rate-limiter-captcha.png

yahoo captcha
Yahoo CAPTCHA

http://www.carolisayazakuser.com/helppage/1aghostloginauthbox1a.gif

msn captcha
MSN CAPTCHA

http://www.msn.com

text based captchas
Text-Based CAPTCHAs
  • Most widely used CAPTCHA
  • Distort text images in order to make them unrecognizable to pattern-recognition programs.
  • Popular because:
    • Task is intuitive (character recognition)
    • Few localization issues (roman characters easily recognized)
    • Strong security potential
good captchas
Good CAPTCHAS
  • Robust
  • Human Friendly
character recognition
Character Recognition
  • Computers excel at recognizing characters, even when the characters are distorted.
  • If the positions of characters are known in CAPTCHAs then breaking the scheme is a matter of pattern recognition.
  • When the positions are not known then the computer programs have difficulty solving them.
slide11

Recognition rate for individual

characters under different distortions

segmentation
Segmentation
  • Segmentation – Identifying characters in the right order.
  • Challenging for both handwriting recognition and computer vision.
  • Traditionally, both computationally expensive and difficult when taking into account all characters in the challenge.
state of the art captchas
“State of the Art” CAPTCHAs
  • Robustness of CAPTCHA must rely on segmentation rather than recognition.
  • If a text-based CAPTCHa is reduced to the challenge of recognizing individual characters then the scheme is effectively broken.
purpose of paper
Purpose of Paper

Yan and El Ahmed’s paper examines the security of the Microsoft CAPTCHA.

  • This scheme was designed by an interdisciplinary Microsoft team. The principle “segmentation resistance” was established by this team.
  • By attacking this CAPTCHA, the authors goal is to determine how the MSN scheme and similar CAPTCHAs can be improved.
  • The paper shows how the Microsoft CAPTCHA was broken with a desktop computer with a 1.86 GHz Intel Core 2 CPU and 2 GB RAM using their algorithms.
e z gimpy and gimpy
E-Z Gimpy and Gimpy

Broken by Mori and Malik:

  • E-Z Gimpy (92% success)
  • Gimpy (33% success)
  • Object-recognition algorithms

http://www.cs.sfu.ca/~mori/research/gimpy

e z gimpy and gimpy1
E-Z Gimpy and Gimpy

Broken by Moy et al:

  • E-Z Gimpy (99% success)
  • 4-letter Gimpy-r (78% success)
  • Used distortion elimination

techniques

http://www.cs.sfu.ca/~mori/research/gimpy

other work
Other Work…
  • Chellapilla and Simard attacked visual CAPTCHAs (4.89% to 66.2% success)
  • Yan and El Ahmad defeated CAPTCHAs generated on Captchaservice.org (Almost 100% success).
    • Accomplished by counting pixels of segmented characters.
    • Examined robustness from security angle.
    • Simple pattern-recognition analysis.
  • PWNtcha – awebsite that demonstrates CAPTCHAs weakness and inefficiencies. Broke visual CAPTCHas (49 to 100% success)
the msn captcha challenge
The MSN CAPTCHA Challenge
  • Each challenge consists of 8 characters.
  • Only upper case letters and digits are used.
  • Text is dark blue and background is light
  • gray.
  • Warping is used to distort characters.
  • Random arcs of different thickness are
  • used in the anti-segmentation
  • measure.
warping
Warping
  • Local
    • Small ripples, waves and elastic deformations along the pixels of the character.
  • Global
    • Character-level, elastic deformations to foil template matching algorithms.
random arcs
Random Arcs
  • Thick Foreground Arcs
    • Same color as characters
    • As thick as the characters
    • Non-intersecting
  • Thin Foreground Arcs
    • Same color as characters
    • As thick as the thinnest parts of characters
    • Intersecting
  • Thin Background Arcs
    • Thin
    • Same color as background
    • Cut through characters
low cost segmentation attack

Low-Cost Segmentation Attack

On Microsoft CAPTCHA

low cost segmentation attack1
Low-Cost Segmentation Attack
  • Goal: Segment Microsoft CAPTCHA challenges.
  • Identify and remove random arcs
  • Identify all character locations in the right order.
  • Accomplishes this by:
    • Dividing each challenge into 8 ordered segments.
low cost segmentation attack2
Low-Cost Segmentation Attack
  • Goal: Segment Microsoft CAPTCHA challenges.
  • Identify and remove random arcs
  • Identify all character locations in the right order.
  • Accomplishes this by:
    • Dividing each challenge into 8 ordered segments.
low cost segmentation attack3
Low-Cost Segmentation Attack
  • Goal: Segment Microsoft CAPTCHA challenges.
  • Identify and remove random arcs
  • Identify all character locations in the right order.
  • Accomplishes this by:
    • Dividing each challenge into 8 ordered segments.
low cost segmentation attack4
Low-Cost Segmentation Attack
  • Goal: Segment Microsoft CAPTCHA challenges.
  • Identify and remove random arcs
  • Identify all character locations in the right order.
  • Accomplishes this by:
    • Dividing each challenge into 8 ordered segments.
attack in 7 steps
Attack in 7 Steps
  • Binarization
  • Fixing Broken Characters
  • Vertical Segmentation
  • Color Filling Segmentation
  • Thick Arc Removal
  • Locating Connected Characters
  • Segment Connected Characters
step 1 binarization
Step 1: Binarization
  • Convert a color challenge to a two-color image using threshhold method.
  • High intensity  White
  • Low intensity  Black
step 2 fixing broken characters
Step 2: Fixing Broken Characters
  • Keep character as a single entity.
  • Prevent small portions of characters from being removed by an arc.
step 2
Step 2
  • Find background color pixels that have left and right neighbors with foreground color
  • Find background color pixels that have top and bottom neighbors with foreground color.
  • Convert pixels identified above to foreground color.
step 3 vertical segmentation
Step 3: Vertical Segmentation

Segmentation method – Divide challenge vertically into chunks.

Divide and Conquer

step 4 color filling segmentation cfs
Step 4: Color Filling Segmentation (CFS)
  • CFS applied to each chunk (Step 3)
  • Find every connected component or “object” in each chunk.
step 4
Step 4
  • CFS Algorithm:
    • Detect foreground pixel and trace it to all connecting pixels. This creates an object.
    • Locate foreground pixel outside of the object and traces connecting pixels to identify the next object.
    • Process essentially amounts to color filling each object. The number of colors used = the number of objects.
step 41
Step 4

8 connectivity - Each pixel has 8 neighbors

step 42
Step 4

A color fill is applied to each chunk, regardless of number of objects in the chunk.

step 5 thick arc removal
Step 5: Thick Arc Removal

Thick Arc Characteristics:

  • Pixel Count – Generally small
  • Location – Close to or intersecting with image border.
  • Shape – Thick arcs do not contain circles. No characters such as A, B, D, P, Q, 4, 6, 8 and 9.
  • Interplay between shape and location – Correlation between thick arcs and geometric shape:
  • Tall but Narrow near start of CAPTCHA
  • Wide and Short near middle
step 5
Step 5

Thick Arc Removal Algorithm

  • Circle Detection
  • Scan objects without circles for distinct features.
  • Relative Position Checking
  • Detection of Remaining Arcs
step 5 1 circle detection
Step 5-1: Circle Detection
  • Draws bounding box around an object.
  • Fill box with a color that is different from foreground and background.
  • Scan for pixels with background color. If found, a circle has been detected.
step 5 2 scan non circle objects for distinctive features
Step 5-2: Scan non-circle objects for distinctive features

Pixel Checking

  • Characters generally have pixel count of over 50.
  • Any characters 50 pixels or less is removed as an arc.
step 5 3 relative position checking
Step 5-3: Relative Position Checking

This step is applied to all chunks with more than one object.

Premise: The positions of objects determines the difference between arcs and characters. Characters are always closer to the baseline. Characters are horizontally juxtaposed, but never vertically.

step 5 4 detection of remaining arcs
Step 5-4: Detection of Remaining Arcs
  • Count the number of remaining arcs in the image.
  • Remaining arcs are generally the first and last object in the current image.
  • Check first and last objects using these rules:

1) If one object contains circle, the other is removed.

2) If neither object contains a circle, the one with the fewer number of pixels is removed.

step 6 locating connected characters
Step 6: Locating Connected Characters

n = number of objects in an image

If n< 8, at least one object has two or more connected characters.

MSN Challenge

1. 8 characters in an image

2. Connected characters are connected

horizontally not vertically and thus are wider.

3. A segmented chunk contains more than one character if the chunk is wider than 35 pixels.

The number of chunks, width of chunks, and number of objects in a chunk are used to guess which chunks contain connected characters.

step 7 segment connected characters
Step 7: Segment Connected Characters
  • Find the width of an object by determining its left and rightmost pixels.
  • Vertically divide object into c parts of same width

where c = number of characters.

results of 7 step attack
Results of 7 Step Attack

Success Rate: 91% (91 out of 100 challenges)

92% of 500 random challenges

Attack Speed:

Implemented in java

1.86 Ghz Intel Core CPU and 2 GB Ram

slide51

Implications: A “state of the art” machine can achieve at least a 95% success rate for recognizing individual characters in MSN scheme. This is a conservative estimate.

Overall success rate for breaking the MSN CAPTCHA: 61% (≈ .92*.95^8).

problems with the attack
Problems with the Attack

1. Failure of Arc Removal

2. Failure of Approximation

3. Failure of Segmentation of Connected Characters

Arc Removal and “Approximation” = 72.8%/82.5%

defense against this attack
Defense Against this Attack
  • Let characters touch/overlap
  • Juxtapose characters in any direction in order to make it harder to tell characters and arcs apart.
  • Randomly vary the width of the characters
strengths and weaknesses of msn scheme
Strengths and Weaknesses of MSN scheme

Strengths:

Good usability – characters generally recognizable

Weaknesses:

Security – vulnerable to simple segmentation attack.

Usability – many characters are still unrecognizable.

considerations
Considerations

Size matters

Both character size and string size. The longer the string, the more security it provides.

String Length

  • If random characters are used, the longer the string, the lower the usability.
  • If dictionary words are used, the longer the string, the greater the usability.

Fixed Length

Aids segmentation attack, but can improve usability.

strengths
Strengths
  • Paper explained how the attack worked very clearly.
  • Good analysis of both successes and failures of the attack.
  • Included area of future study which is useful.
weaknesses
Weaknesses
  • Organization
  • Despite the clear explanation, the paper included no insight on how the authors created the algorithm.
  • Much of the attack was based on the premise that the CAPTCHA has 8 letters. With the evolution of CAPTCHAs and attacks, this might generate an idea of false security if CAPTCHAs have a random number of letters and/or have less applicable for analyzing other CAPTCHA schemes.
references
References
  • A Low-cost Attack on a Microsoft CAPTCHA, Jeff Yan, Ahmad Salah El Ahmad, CCS 2008. http://homepages.cs.ncl.ac.uk/jeff.yan/msn_draft.pdf
  • http://en.wikipedia.org/wiki/Captcha
  • http://www.searchenginepeople.com/blog/page/16http://www.searchenginepeople.com/blog/page/16
  • http://www.codinghorror.com/blog/images/google-error-were-sorry-rate-limiter-captcha.png
  • http://www.virtualblight.com/articles/wp-content/uploads/2008/03/captcha_1_4.jpg