1 / 57

Charles Tappert Seidenberg School of CSIS, Pace University

Keystroke Biometric Identification and Authentication on Long-Text Input Summary of eight years of research in this area. Charles Tappert Seidenberg School of CSIS, Pace University. DPS Biometric Dissertations. Completed Keystroke Biometric (long text input)

kenley
Download Presentation

Charles Tappert Seidenberg School of CSIS, Pace University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Keystroke Biometric Identification andAuthentication on Long-Text InputSummary of eight years of research in this area Charles Tappert Seidenberg School of CSIS, Pace University

  2. DPS Biometric Dissertations • Completed • Keystroke Biometric (long text input) • Identification: feasibility study – Mary Curtin 2006 • Identification: desk/laptop + copy/free text – Mary Villani 2006 • Identification: touch-type feature/fallback hierarchy – Mark Ritzmann 2007 • Authentication: kNN ROC curve derivation methods – Robert Zack 2010 • Authentication: statistical fallback for missing/incomplete info – Steve Kim 2013 • Stylometry + Keystroke Biometric (long text input) • Authentication of online test-takers – John Stewart 2012 • In Progress • Keystroke Biometric (short and long text input) • Authentication: text/spreadsheet/browser/keypad input – Ned Bakelman • Various System Improvements – Vinnie Monaco (MS thesis, PhD dissertation) • Voiceprint • Common passphrase approach: “My name is” – Jonathan Leet • Future Work • Mouse Movement • Biometrics on Handhelds • Fusion Methods to Combine Biometrics Keystroke Biometric Studies

  3. References • J.V. Monaco, J.C. Stewart, S. Cha, and C.C. Tappert, Behavioral Biometric Verification of Student Identity in Online Course Assessment and Authentication of Authors in Literary Works, Proc. IEEE 6th Int. Conf. Biometrics, Wash. D.C., Sep 2013. • N. Bakelman, J.V. Monaco, S. Cha, and C.C. Tappert, Keystroke Biometric Studies on Password and Numeric Keypad Input, Proc. 2013 European Intelligence and Security Informatics Conf., Sweden, Aug 2013. • J.V. Monaco, N. Bakelman, S. Cha, and C.C. Tappert, Recent Advances in the Development of a Long-Text-Input Keystroke Biometric Authentication System for Arbitrary Text Input, Proc. European Intell. and Sec. Inform. Conf., Sweden, Aug 2013. • J.V. Monaco, N. Bakelman, S. Cha, and C.C. Tappert, Developing a Keystroke Biometric System for Continual Authentication of Computer Users, Proc. European Intell. and Sec. Inform. Conf., Denmark, Aug 2012, pp 210-216. • J.C. Stewart, J.V. Monaco, S. Cha, and C.C. Tappert, "An Investigation of Keystroke and Stylometry Traits," Proc. Int. Joint Conf. Biometrics (IJCB 2011), Wash. D.C., Oct 2011. Summary of Stewart’s dissertation. • C.C. Tappert, S. Cha, M. Villani, and R.S. Zack, "A Keystroke Biometric System for Long-Text Input," Int. J. Info. Security and Privacy (IJISP), Vol 4, No 1, 2010, pp 32-60. Best summary of keystroke system. • R.S. Zack, C.C. Tappert and S.-H. Cha, "Performance of a Long-Text-Input Keystroke Biometric Authentication System Using an Improved k-Nearest-Neighbor Classification Method," Proc. IEEE 4th IntConf Biometrics: Theory, Apps, and Systems (BTAS 2010), Washington, D.C., Sep 2010. Summary of Zack’s dissertation. • C.C. Tappert, M. Villani, and S. Cha, "Keystroke Biometric Identification and Authentication on Long-Text Input," pp 342-367, Chapter 16 in Behavioral Biometrics for Human Identification: Intelligent Applications, Edited by Liang Wang and XinGeng, Medical Information Science Reference, 2010. • M. Villani, C.C. Tappert, G. Ngo, J. Simone, H. St. Fort, and S. Cha, "Keystroke Biometric Recognition Studies on Long-Text Input under Ideal and Application-Oriented Conditions," Proc. CVPR 2006 Workshop on Biometrics, New York, NY, June 2006. Summary of Villani’s dissertation. Keystroke Biometric Studies

  4. IntroductionBuild a Case for Usefulness of Study • Validate importance of study – applications • Define keystroke biometric • Appeal of keystroke over other biometrics • Previous work on the keystroke biometric • No direct study comparisons on same data • Feature measurements • Make case for using: data over the internet, long text input, free (arbitrary) text input • Extends previous work by authors • Summary of scope and methodology • Summary of paper organization Keystroke Biometric Studies

  5. Introduction Validate importance of study – applications • Internet authentication application • Authenticate (verify) student test-takers • Internet identification application • Identify perpetrators of inappropriate email • Internet security for other applications • Important as more businesses move toward e-commerce Keystroke Biometric Studies

  6. Introduction Define Keystroke Biometric • The keystroke biometric is one of the less-studied behavioral biometrics • Based on the idea that typing patterns are unique to individuals and difficult to duplicate Keystroke Biometric Studies

  7. Introduction Appeal of Keystroke Biometric • Not intrusive – data captured as users type • Users type frequently for business/pleasure • Inexpensive – keyboards are common • No special equipment necessary • Can continue to check ID with keystrokes after initial authentication • As users continue to type Keystroke Biometric Studies

  8. Introduction Previous Work on Keystroke Biometric • One early study goes back to typewriter input • Identification versus authentication • Most studies were on authentication • Two commercial products on hardening passwords • Few on identification (more difficult problem) • Short versus long text input • Most studies used short input – passwords, names • Few used long text input –copy or free text • Other keystroke problems studies • One study detected fatigue, stress, etc. • Another detected ID change via monitoring Keystroke Biometric Studies

  9. Introduction No Direct Study Comparisons on Same Data • No comparisons on a standard data set • (desirable, available for many biometric and pattern recognition problems) • Rather, researchers collect their own data • Nevertheless, literature optimistic of keystroke biometric potential for security Keystroke Biometric Studies

  10. Introduction Feature Measurements • Features derived from raw data • Key press times and key release times • Each keystroke provides small amount of data • Data varies from different keyboards, different conditions, and different entered texts • Using long text input allows • Use of good (statistical) feature measurements • Generalization over keyboards, conditions, etc. Keystroke Biometric Studies

  11. IntroductionMake Case for Using • Data over the internet • Required by applications • Long text input • More and better features • Higher accuracy • Free text input • Required by applications • Predefined copy texts unacceptable Keystroke Biometric Studies

  12. Introduction Extends Previous Work by Authors • Previous keystroke identification study • Ideal conditions • Fixed text and • Same keyboard for enrollment and testing • Less ideal conditions • Free text input • Different keyboards for enrollment and testing Keystroke Biometric Studies

  13. Introduction Summary of Scope and Methodology • Determine distinctiveness of keystroke patterns • Two application types • Identification (1-of-n problem) • Authentication (yes/no problem) • Two indep. variables (4 data quadrants) • Keyboard type – desktop versus laptop • Entry mode – copy versus free text Keystroke Biometric Studies

  14. Keystroke Biometric System Components • Raw keystroke data capture • Feature extraction • Classification for identification • Classification for authentication Keystroke Biometric Studies

  15. Keystroke Biometric SystemRaw Keystroke Data Capture Keystroke Biometric Studies

  16. Keystroke Biometric SystemRaw Keystroke Data Capture Keystroke Biometric Studies

  17. Keystroke Biometric SystemFeature Extraction • Mostly statistical features • Averages and standard deviations • Key press times • Transition times between keystroke pairs • Individual keys and groups of keys – hierarchy • Percentage features • Percentage use of non-letter keys • Percentage use of mouse clicks • Input rates – average time/keystroke Keystroke Biometric Studies

  18. Keystroke Biometric SystemFeature Extraction A two-key sequence (th) showing the two transition measures Keystroke Biometric Studies

  19. Keystroke Biometric SystemFeature Extraction Hierarchy tree for the 39 duration categories Keystroke Biometric Studies

  20. Keystroke Biometric SystemFeature Extraction Hierarchy tree for the 35 transition categories Keystroke Biometric Studies

  21. Keystroke Biometric SystemFeature Extraction • Fallback procedure for few/missing samples • When the number of samples is less than a fallback threshold, take the weighted average of the key’s mean and the fallback mean Keystroke Biometric Studies

  22. Keystroke Biometric SystemFeature Extraction • Two preprocessing steps • Outlier removal • Remove duration and transition times > threshold • Feature standardization • Convert features into the range 0-1 Keystroke Biometric Studies

  23. Keystroke Biometric SystemClassification for Identification • Nearest neighbor using Euclidean distance • Compare a test sample against the training samples, and the author of the nearest training sample is identified as the author of the test sample Keystroke Biometric Studies

  24. Keystroke Biometric SystemClassification for Authentication • Cha’s vector-distance (dichotomy) model Keystroke Biometric Studies

  25. Experimental and Data Collection Design • Two independent variables • Keyboard type • Desktop – all Dell • Laptop – 90% Dell + IBM, Compaq, Apple, HP, Toshiba • Input mode • Copy task – predefined text • Free text input – e.g., arbitrary email Keystroke Biometric Studies

  26. Experimental and Data Collection Design Keystroke Biometric Studies

  27. Subjects and Data Collection • Subjects provided samples in at least two quadrants • Five samples per quadrant per subject • Summary of subject demographics Keystroke Biometric Studies

  28. Experimental Results • Identification experimental results • Authentication experimental results • Longitudinal study results • System hierarchical model and parameters • Hierarchical fallback model • Outlier parameters • Number of enrollment samples • Input text length • Probability distributions of statistical features Keystroke Biometric Studies

  29. Experimental ResultsIdentification Experimental Results Identification performance under ideal conditions (same keyboard type and input mode, leave-one-out procedure) Keystroke Biometric Studies

  30. Experimental ResultsIdentification Experimental Results Identification performance under non-ideal conditions (train on one file, test on another) Keystroke Biometric Studies

  31. Experimental and Data Collection Design Keystroke Biometric Studies

  32. Experimental ResultsAuthentication Experimental Results Authentication performance under ideal conditions (weak enrollment: train on 18 subjects and test on 18 different subjects) Keystroke Biometric Studies

  33. Experimental ResultsLongitudinal Study Results • Identification – 13 subjects at 2-week intervals • Average 6 arrow groups: 90% -> 85% -> 83% • Authentication – 13 subjects at 2-week intervals • Average 6 arrow groups: 90% -> 87% -> 85% • Identification – 8 subjects at 2-year interval • Average 6 arrow groups: 84% -> 67% • Authentication – 8 subjects at 2-year interval • Average 6 arrow groups: 94% -> 92% (all above results under non-ideal conditions) Keystroke Biometric Studies

  34. Experimental Results System hierarchical model and parameters Touch-type hierarchy tree for durations (Mark Ritzmann) Keystroke Biometric Studies

  35. Experimental Results System hierarchical model and parameters Identification accuracy versus outlier removal passes Keystroke Biometric Studies

  36. Experimental Results System hierarchical model and parameters Identification accuracy versus outlier removal distance (sigma) Keystroke Biometric Studies

  37. Experimental Results System hierarchical model and parameters Identification accuracy versus enrollment samples Keystroke Biometric Studies

  38. Experimental Results System hierarchical model and parameters Identification accuracy versus input text length Keystroke Biometric Studies

  39. Experimental Results System hierarchical model and parameters Distributions of “u” duration times for each entry mode Keystroke Biometric Studies

  40. Conclusions • Results are important and timely as more people become involved in the applications of interest • Authenticating online test-takers • Identifying senders of inappropriate email • High performance (accuracy) results if • 2 or more enrollment samples/user • Users use same keyboard type Keystroke Biometric Studies

  41. ROC Curves (Robert Zack, 2010) ROC curves from the kNN classifier with k=21: method m-kNN (left), method wm-kNN (center), and method hd-kNN (right).

  42. FAR and FRR versus threshold Closed 14-14 system, kNN classifier with k=21: FAR and FRR versus threshold for method m-kNN (left), wm-kNN (center), hd-kNN (right).

  43. Conclusions (Robert Zack, Authentication Study, 2010) Keystroke password performance – approximately 10% EER See extensive study by Killourhy & Maxion, 2009 Advertised performance of commercial products is exaggerated Keystroke long-text performance – approximately 1% EER Reasonable considering powerful statistical features Closed system better than open system performance Three ROC curve derivation methods developed for kNN procedure All are two-parameter methods – k plus a threshold

  44. Online Test-Taker Authentication (John Stewart, 2011) Best Keystroke Performance – 0.55% EER Closed system of 30 students Best Previous Keystroke Performance – 1.0% EER Closed system of 14 students (Robert Zack, 2010) Best Stylometry Performance – approximately 30.0% EER Keystroke biometric operates at the automatic motor control level Because stylometry operates at a higher cognitive word/syntax level, longer text passages are required for reasonable performance This hypothesis was verified on much longer texts of short novels

  45. Keystroke Data Capture Systems Java Applet Mary Curtin, Mary Villani, Mark Ritzmann, Robert Zack, Vinnie Monaco/Ned Bakelman (EISIC paper) Java Script (Vinnie Monaco) John Stewart / Vinnie Monaco Fimbel Open Source Keylogger Ned Bakelman / Vinnie Monaco Should we develop our own keylogger?

  46. Continual Authentication of Computer Users(EISIC2013 Conference Paper) • Motivation – The technology is applicable to a wide range of government, private company, and academic applications worldwide • For example, to detect intruders, the U.S. Government wants to continually authenticate all government computer users, both military and non-military • U.S. DARPA 2010 and 2012 Requests for Proposals • Requirement – detect intruder within minutes • Current study focuses on this fast detection application • Authentication of students taking online tests • U.S. Higher Education Opportunity Act of 2008 EISIC 2013

  47. Continual Burst Authentication StrategyAssumptions • Most computer users tend to have bursts of input activity interspersed with periods of inactivity while doing other things • The application is designed for typical business or government office computer usage • Note: it would be interesting to determine the frequency and duration of bursts of computer input activity in typical office environments EISIC 2013

  48. Continuous vs Continual Authenticationwith Data Capture Windows Burst 1 Burst 2 Burst 3 • Continuous (ongoing) burst authentication • Continual burst authentication with pauses 1 min 1 min 1 min 1 min 1 min 1 min 0 0 8 min 5 min 30 min 10 min Burst 1 Burst 2 Burst 3 Pause Threshold Pause Threshold EISIC 2013

  49. Continual Burst Strategy after PausesReduces Frequency of Authentications • Avoids capture of excessive quantities of data • Reduces need for excessive computing resources • Reduces false alarm rate • Still provides sufficient data for continual training of the biometric system EISIC 2013

  50. Two Important Time Periodsfor Continual Burst Authentication • Length of the data capture window • Short enough to catch an intruder before significant harm is caused • On the order of minutes – DARPA • Long enough to make an accurate detection and reduce false alarms • Length of the pause • Must be shorter than entry time of intruder • Long enough to reduce authentication rate Note: periods of little computer activity cause long pauses EISIC 2013

More Related