AN IMPROVED AUDIO

AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA

WHAT ARE CAPTCHAS? • CAPTCHAs are tests generated by computers and generally passable by humans but not current computer programs.

THE PROBLEM WITH CURRENT AUDIO CAPTCHAS • In some cases the human passing rate is only 70%! • To make the CAPTCHAs secure, noise was injected into the audio files making it harder for both computers and humans to pass.

ARE CURRENT AUDIO CAPTCHAS SECURE? • A CAPTCHA is considered broken once a program can pass it 5% of the time. • Since the current audio CAPTCHAs use a limited vocabulary, it was possible for us to collect enough data to train a system that could pass the current audio CAPTCHAs more than 45% of the time.

HOW DID WE TEST THE CURRENT AUDIO CAPTCHAs? Selected three different types of audio CAPTCHAs: google, reCAPTCHA, and digg Collected 1000 CAPTCHAs per type of audio CAPTCHA to use for training and testing Created an ASR system using machine learning techniques

THE ALGORITHM Given the .wav file of an audio CAPTCHA Segmentation - selecting portions of the audio which most likely are digits/letters Recognition Extract features from the segment Classify segment as digit/letter or noise and output the label Stop once a maximum number of segments are classified

ALGORITHM DETAILS - SEGMENTATION CAPTCHAs were manually labeled and segmented. We created training segments using this information. For testing, we chose the highest energy peaks in the test CAPTCHA and selected fixed size segments roughly centered at the peaks.

ALGORITHM DETAILS - FEATURES We used three popular techniques for extracting features from speech to derive 5 sets of features from the audio. Mel-frequency cepstral coefficients (MFCC) Perceptual linear prediction (PLP) Relative spectral transform with PLP (RASTA-PLP)

ALGORITHM DETAILS - AdaBoost Used decision stumps for weak classifiers For each type of audio CAPTCHA we created enough classifiers to label a segment as a digit, letter, or noise.

ALGORITHM DETAILS - SVM Created a single multiclass classifier using all the training segments (from 900 CAPTCHAs)

ALGORITHM DETAILS - k-NN Created 5 classifiers corresponding to each of the feature sets Used Euclidean distance as our distance metric Cross-validation gives us k=1

THE ALGORITHM Input: Audio CAPTCHA as an audio file Segmentation Find the highest energy peak, and extract a fixed size segment centered at that peak Recognition Extract features from segment Give segment to classifier and obtain label Stop extracting segments once all segments have been labeled or a max solution size is reached.

Using three machine learning techniques to perform ASR on the CAPTCHAs AdaBoost Support Vector Machines (SVM) k-Nearest Neighbor (k-NN) ANALYSIS OF CURRENT AUDIO CAPTCHAs

THE GOAL • Make a secure audio CAPTCHA which will be easier for a human to pass and harder for a computer to pass. • Equate solving a CAPTCHA with doing some useful work. • In other words, create an audio reCAPTCHA.

WHAT IS reCAPTCHA? • reCAPTCHA helps digitize text on which OCR fails by using the text as its CAPTCHA. • Since millions of people solve CAPTCHAs each day, millions of words get digitized each day!

THE AUDIO RECAPTCHA • Takes advantage of the human ability to understand words through context. • Will help transcribe digital audio on which ASR systems fail. • The audio being used was originally recorded with the intention that it should be easily understood by humans.

HOW WILL IT WORK? • Start with a database of phrases with known transcriptions. • Give user adjacent phrases to transcribe as the CAPTCHA . • Check user solution against the database to determine the result of the test. Store the rest of the solution as transcription

Segment #1 Segment #2 Segment #3 That was the shot that killed Harry Lime. He died in a Harry Lime he died in a sewer beneath Vienna Harry Lime. He died in a

ANALYSIS OF SECURITY • Speaker independent recognition is difficult. • Open vocabularies make it even more difficult for ASR systems • AM broadcasts and .mp3 compression cause the loss of important data needed for automatic analysis

CONCLUSION • CAPTCHAs need to be more accessible, yet remain secure and not too difficult for humans. • Deploy audio reCAPTCHA through reCAPTCHA site. • Help make knowledge captured in audio available in text form

ACKNOWLEDGEMENTS • Dr. Luis von Ahn, CMU • Dr. Manuel Blum, CMU • Dr. Roni Rosenfeld, CMU • David Huggins-Daines, CMU • Jiri Simsa, CMU • Sean Hyde, CMU

AN IMPROVED AUDIO

AN IMPROVED AUDIO

Presentation Transcript

Audio Recording And Production: An Introduction

An Improved Method for Classifying Forest Fragmentation

The Need for an Improved PAUSE

Audio Recording And Production: An Introduction

This presentation has an audio narration.

An introduction to audio/video compression

Towards an improved PEPT triangulation routine

Mortgage Purchase Program: An Improved Opportunity

An improved search of the nEDM

“Improved Audio Template”

Audio/Video compression An introduction

Construction of an improved Printing System

Exploratory Testing – an improved approach

What is an Improved Stove?

WavRide An Audio Game Experiment

This is an audio presentation

An improved nuclear mass formula

APH Book Port An Audio PDA

An improved rotation-invariant thinning algorithm

Proposal for an improved B16 sequence

An improved energy balance model

Why is an Audio Recorder Important?