1 / 67

Keystroke Biometric Identification Studies on Long-Text Input

Keystroke Biometric Identification Studies on Long-Text Input. Mary Villani DPS 2006 Fall 2008. Objective. For long-text input of 600 keystrokes Determine the viability of the keystroke biometric – two independent variables Different entry modes – copy and free text

satya
Download Presentation

Keystroke Biometric Identification Studies on Long-Text Input

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Keystroke Biometric Identification Studies on Long-Text Input Mary Villani DPS 2006 Fall 2008

  2. Objective • For long-text input of 600 keystrokes • Determine the viability of the keystroke biometric – two independent variables • Different entry modes – copy and free text • Different keyboards – desktop and laptop

  3. Secondary & Tertiary Goals • When subjects are Aware they are being observed or Unaware • Identify Patterns or recognition based on subject demographics (ie. handedness, gender, age, language)

  4. Biometrics / Biometric Technologies • Biometrics • identifying an individual based on his or her distinguishing characteristics or the science of identifying, or verifying the identity of a person based on physiological or behavioral characteristics [Bolle et al.] • Biometric Technologies • Automated methods of verifying or recognizing the identity of a living person based on physiological or behavioral characteristic [Miller, B]

  5. Keystroke Biometrics • Keystroke identification of a person by their personal typing style or keystroke pattern • Each individual has a characteristic typing ability that is unique [Bolle] • Typing biometrics is the analysis of a user’s keystroke patterns [Conn et al]

  6. Advantages of Keystroke Biometric • Keyboards commonly used • Not intrusive • Inexpensive • Can Frequently Re-authenticate the User

  7. Literature Search • Copy Task • Long Text • Early studies • About 5 majors (Gaines, Umphress, Leggert) • Short Entry / Password Hardening • At least 16, biggest focus • Even noted the longer the word, the higher the accuracy • Product BioPassword • Free Text • Song (continuous monitoring) • GunettiPicardi (August 2005)

  8. Literature Search • Features Extracted • Mostly means and standard deviations of press and transition (diagraph) • Some trigraphs, 4graphs, 6graphs • Most pre-process, remove errors or outliers • Some applied difficulty factor, zones for the keyboard, some factored length and overall percentages

  9. Literature Search • Classification Approaches • KNN / Euclidean Distance • Most popular, simples, highest accuracy, complaint long processing time • Fuzzy Logic • Neural networking/genetic algorthims • Bayesian classification • Combination thereof

  10. Contribution • Copy Under Non – Ideal Condition • Copy Compared to Free-Text • Desktop Compared to Laptop • Features & Fallback • Length for Free-Text • Impact of Outlier & Distance of Outlier • Optimized Performance Parameters –Free Text

  11. Gleaned From Initial Experiments & the Literature • More features extracted from raw data yielded better results • Falling back when no occurrence of a keystroke to mean and standard deviation of all degraded performance • Increasing # of participants degrades performance but made the experiment more robust

  12. These Experiments • Increasing feature set to 259 from 58 • Staying with long passage • Must use a larger participant pool for validity of study • Testing entry in non-ideal conditions • Different input type (free text vs. copy) • Different keyboard type (desktop vs. laptop) • User Awareness Level (aware vs. unaware)

  13. Updated Feature Set

  14. Keystroke Biometric System Components • Data Capture Applet • Feature Extractor • Pattern Classifier

  15. Data Capture

  16. Login Screen

  17. First Part of Demographic Questionnaire

  18. Second Part of Demographic Questionnaire Note: subject is asked to sign off for IRB approval # 16

  19. Subjects Could Choose Keyboard and Task The feature files were named based on their choices and entry #

  20. Copy Task Entry Mode

  21. Free-Text Entry Mode

  22. Sample Raw Feature Data Sample Raw Feature Data File Hello World

  23. 239 Feature Measurements • 78 Key Press Duration Measures (39 means and 39 standard deviations) • 70 Key Transition Type 1 Measures (35 means and 35 standard deviations) • 70 Key Transition Type 2 Measures (35 means and 35 standard deviations) • 21 Other Measures (percentages and rates)

  24. Type 1 and 2 Transition Measures

  25. Key Press Duration Features and Fallback HierarchyWhat to do when key not used often Hierarchy tree for the 39 duration features (each oval), each represented by a mean and a standard deviation.

  26. Key Transition Featuresand Fallback Hierarchy Hierarchy tree for the 35 transition features (each oval), each represented by a mean and a standard deviation for each of the type 1 and type 2 transitions.

  27. Fallback for Few Samples • Mean and Standard Deviation Computation when number of samples n(i) is less than kfallback-threshold • Similar to NLP “backoff” statistics for n-grams

  28. Two Preprocessing Steps • Outlier removal • Remove samples > 2σ from µ • Prevents feature skewing from pauses • Standardization • Scales to range 0-1 to give roughly equal weight to each measure

  29. Pattern Classifier • Nearest Neighbor Classifier using Euclidean Distance

  30. Experimental DesignSix Main Experiments per Six Arrows

  31. Experimental DesignKeyboards (independent variable 1) • Desktop Keyboards – mostly (~100%) Dell desktops in a classroom environment • Laptop Keyboards – about 90% Dell laptops, some IBM, HP, Apple (greater variety of laptop than desktop keyboards)

  32. Experimental DesignInput Modes (independent variable 2) • Copy Task Input – specified text of about 600 keystrokes + corrections • Free Text Input – creation of arbitrary emails (at least 600 keystrokes)

  33. Data Collection

  34. Subject Participation

  35. Participation By ExperimentEach subject entered 5 texts in at least two quadrantsA total of 36 participated in all four quadrants Desktop Laptop 1 52 Subjects 4 Copy 3 5 40 Subjects 47 Subjects 93 Subjects Free Text 41 Subjects 6 2 40 Subjects

  36. Five Sub Experiments for Each of the Six Arrows d & e a. Training & testing on data in quadrant at first end of arrow (leave-one-out procedure) b. Training & testing on data in quadrant at second end of arrow (leave-one-out procedure) c. Combining data at each arrow end (leave-one-out procedure) d. Training on first end – testing on second e. Training on second end – testing on first b a c

  37. Results Experiment 136 subjects participated in all quadrants

  38. Results Experiment 236 subjects participated in all quadrants

  39. Results Experiment 336 subjects participated in all quadrants

  40. Results Experiment 436 subjects participated in all quadrants

  41. Results Experiment 536 subjects participated in all quadrants

  42. Results Experiment 636 subjects participated in all quadrants

  43. 36 Subject Summary

  44. All Subject SummarySupports 36 Subject Results

  45. Conclusions • Best accuracies for same keyboard and same input mode • Accuracy dropped significantly for different keyboards or for different input modes • Accuracy for different input modes better than accuracy for different keyboards • Accuracy for copy mode somewhat better than accuracy for free-text mode • Accuracy decreased as the number of subjects increased

  46. Long-Text Input Applications • Identify the author of inappropriate email and possibly even IM • Authenticate the student taking online exams

  47. Future Work • Authentication • We’ve collected longitudinal data • Try more sophisticated classifiers • Neural Networks • Support Vector Machines • Explore the data with data mining • Identify patterns cross referencing demographics • Aware/Unaware with Better Management

  48. Future Research Continued • Observe keystroke patterns over time (ages 15, 20, 25, 30 25 – same person) • Observe those who learned computer usage at a very young age to those who learned in their adult life, or different typing levels • Identify letters and letter pairs that provide more value to the accuracy level • More work in free-text and how it differs from copy

  49. Unaware / Aware

  50. Demographic Studies • Compared 1st half of 93 participant copy text to 2nd half • Train on One, Test on the other matching on • Gender • Age • Language • Handedness • Ultimately Wacky results • Code change at very end to accommodate, don’t trust the programs are processing correctly • Moved to Future Work

More Related