A IA 2007

AIA 2007 ENHANCED PASSWORD AUTENTICATION THROUGH KEYSTROKE TYPING CHARACTERISTICS Ozlem Guven(1), Selim Akyokus(1), Mitat Uysal(1), Aykut Guven(2) (1)Department of Computer Engineering, Dogus University, Istanbul, Turkey {oguven,sakyokus,muysal}@dogus.edu.tr (2)IDEA Tekonoloji Inc., Istanbul, Turkey aguven@ideateknoloji.com The IASTED International Conference on Artificial Intelligence and Applications AIA 2007 February 12 – 14, 2007 Innsbruck, Austria

Outline • Biometric Security Systems • Keystroke Pattern Recognition Systems • Keystroke Timing Information • Capturing Keystroke Dynamics Data • A Statistical Modeling Approach for Keystroke Recognition • Experimental Results • Conclusion

Biometric Security Systems • One of the most active research fields in computer security research is developing more secure authentication methods for user access by the use of biometric means. • Biometrics is a relatively new discipline that concerns the use of a person’s physiological or behavioral characteristics for the automatic identification of that person. • These are many types of biometric security systems based on methods such as face recognition, fingerprint recognition, iris recognition, handwriting recognition, and so on.

Biometric Security Systems • A biometric security systemis a pattern recognition system that compares a feature data set obtained from a person with template data set stored in a database. • Biometric security system configurations change according to chosen biometric feature, but there are some basic procedural functions that every system must include.

Enrollment Template Database Input Data Biometric Sensor Feature Extractor Identification/Verification Biometric Sensor Input Data Feature Extractor Feature Matcher (Classifier) Access granted/denied A General biometric security system architecture • The enrollment part is responsible for registering people’s characteristics in the biometric template database. • The identification/verification part of a biometric security system is responsible for identifying/verifying individuals at the point of access by using a classifier.

Keystroke Pattern Recognition Systems • Keystroke dynamics biometric systems analyze the way when a user types at a terminal by monitoring the keyboard events. • Keystroke dynamics refers the timing information or pattern collected about the way a user types while using a computer keyboard. • Keystroke dynamics is known with a few different names: keyboard dynamics, keystroke analyses, typing biometrics and typing rhythms. • Biometric security systems based on keystroke dynamics utilize keystroke dynamics information for user authentication since every user has a different typing pattern.

Keystroke Timing Information • Keystroke dynamics include several different measurements which can be detected when the user presses keys on the keyboard. • Possible measurements include: • Latency between consecutive keystrokes, • Duration of the keystroke, hold time, • Overall typing speed, • Frequency errors, (how often the user has use the backspace), • The habit of using additional keys in the keyboard, for example writing numbers with the numeric pad, • In what order does the user press keys when writing capital letters, is shift or the letter key is released first, • The force used when hitting keys while typing (requires a special keyboard). • Most keystroke recognition systems do not necessarily employ all of these features. Most of the applications usually measures only latencies consecutive keystrokes or duration of keystrokes.

Capturing Keystroke Dynamics Data • When typing on a keyboard, both key press and release events generate hardware interrupts. • Keystroke dynamics information can be easily captured by using these interrupts. • Capturing keystroke dynamics data has however a few complications. Several keys can be pressed at the same time or user presses the next key before releasing the previous one. • Another very important problem is that typing skills of people varies extremely. • A beginner typist can type very slowly with one finger by a “hunt-and-peck” style. While a professional typist can type very fast in order of ten times faster than a beginner typist. • The typing also depends on the mood of typist at the time of typing, what he types, or when using different types of keyboards. • There are many factors to be taken into account when designing a keystroke dynamics recognition system

Capturing Keystroke Dynamics Data • Generally, each user has a different typing pattern. • The following shows the graph of keyword latencies of passwords entered by a user at 10 trials. As it is seen in the figure, each user has typing pattern at which keyword latencies between successive hits are very close to each other.

Keystroke Pattern Recognition Systems • Keystroke dynamics recognition systems can be used for both verification (is this the person whom I think?) and identification (who is this person?). • Identification involves comparing the acquired keystroke information against templates corresponding to all users in the database. • Verification involves comparison with only those templates corresponding to the claimed identity. • These systems have the advantage of not requiring specially designed devices and complex software to be implemented. • Keystroke recognition systems are usually used to enable hardening or strengthening the login-password verification process.

Login-password Verification Process • A typical and very common example of verification is when a user logs on to a computer at work. • He or she will then be asked for a username and password, the system will then find the matching username in the database and verify if the entered password matches the one stored with the username in the database. • If someone knows a username together with the password, one can access the computer system. Passwords are also often quite easy to guess. • People tend to use passwords like their birth days, pet names and so on which may have direct relationship with the person, or they may be normal dictionary words. In most cases, they are easily guessed by trying all of them. • Keystroke recognition systems enable hardening or strengthening the password verification process by comparing the captured keystroke dynamics information with the user’s templates stored in a template database. • The system either rejects or accepts the login depending on if the entered information matches the stored template or not.

Classification Methods in Keystroke Recognition. • There are many methods used in keystroke dynamics recognition systems. • statistical methods including t-tests [8], means, standard deviations [9,10], non-weighted probability algorithm, weighted probability algorithm[10], • machine learning or data mining methods that include nearest neighbor classifiers that use different distance metrics such as Euclidean and Mahalanobis [11,12,13], neural networks[14,15,16], k-means [12], Bayesian classification[12,17], decision trees[18], • fuzzy classification methods[19,20], and genetic algorithms and support vector machines [21].

A Statistical Modeling Approach for Keystroke Recognition • In this study, we used a model for keystroke recognition using an architecture that resembles a neural network as the structure. • The model used in this study carries the characteristics of the neural network structure. • Normally, weights of a neural network are adjusted using a learning technique that minimizes the difference between the actual output and predicted output. • In this study, the weights of the layered network structure are determined by statistical methods.

Training Phase • The average and standard deviations are determined for each user using the training dataset. • Pu,k and σu,kare the average and standard deviations of kth keyword latency for a user u.

Testing Phase • At the testing stage, the test keyword latencies entered by user at a trial forms test pattern dataset for a user. • The keyword latencies obtained in a trial are compared with user’s templates (averages and standard deviations) stored in the template database by using by our matching algorithm. • Then user is given authorization to enter the computer system if template matches, otherwise rejected.

The Matching Algorithm • We use keystroke latencies (time between successive key hits) as a measure to differentiate different users in our algorithm. • The average and standard deviations of keyword latencies determines the weights of a layered network structure. • The layered network structure is used for comparing and identification of keystroke rhythms. It resembles a neural network.That is why we sometimes call it as a neural network like structure.

Oi = Tt,i – Pu,i -2σu,k< Oi < 2σu,k ∑ Tt,1 σu,1 Pu,1 ∏ Tt,2 ∑ (0, 1) σu,2 Pu,2 ∑ Tt,k σu,k Pu,k The Layered Network Structure • The Tt,k is kth keystroke latency entered by user u at a trial t forms test pattern dataset for a user. • The weights Pu,k and σu,kare the average and standard deviations of kth keyword latency for a user u. • The layered network structure basically compares compare the latencies of each login and test if they fall between two standard deviations from the average reference latency for each latency. • If all of the possible latencies passed this test then input for that password string would be considered valid.

Biometric Classifier Performance Metrics • Classifiers used in biometric systems typically use three metrics to describe biometric classifier performance. • false rejection rate (FRR): is the percentage of valid (genuine) user attempts identified as imposters. It determines how often a valid user is not verified successfully. • false acceptance rate (FAR): is the percentage of imposter access attempts identified as a valid users. It determines how often an imposter user can successfully bypass the security system. • equal error rate (ERR): is the crossover point at which FRR equals FAR. • The FRR and FAR error rates are inversely proportional to each other; lowering one error rate will raise the other. • The point ERR where FAR=FRR, gives the best choice of operation for a specific biometric system for the most of common biometric applications. • The decision threshold parameters used in biometric recognition algorithms must be adjusted according to the ERR crossover point where FRR equals to FAR. • In our study, the threshold parameter is chosen as 2σ. The experimental studies are done with different threshold values σ, 2σ and 3σ. These studies show that the threshold parameter 2σ produces the best results [10,13].

Experimental Results • This study uses a dataset which consists of the keyword latencies of passwords for 16 users. • The datasets were collected by Aykut Guven and Ibrahim Sogukpinar in an study done in [22]. • All passwords are 8 characters long. • For each password entrance, there are 7 keyword latencies recorded in datasets. • A matlab program has been coded to test our model using the dataset that consists of password typing patterns of 16 users. • At the learning phase, the average and standard deviations Pu,k and σu,k of keyword latenciesare determined for each user using the training dataset where u=1,2,..16 is the user number and k=1,2,..7 is the keyword latency number. • Then, the test datasets are applied to the neural-statistical algorithm. • The recognition rate (RR) and False Rejection Rate (FRR) are computed for each of the users. • Recognition rate is the authorized user who accesses the system successfully and FRR is the authorized (valid) users who are identified as imposter users.

FRR (False Rejection Ratio) Results for all Users • As it can be seen form the table, the user number 13 is noticed with the lowest recognition ratio value as %72.12 and highest FRR value as %27.88, which is the worst case on the test results. • The best result is obtained for the user number 15 with the recognition rate value %94,22 and FRR value % 5,78. • For 16 users, the average performance success rate of the overall system is calculated; recognition rate as %83 and FRR as %17. These results are compatible with another study done by Fabian Monrose and Aviel D. Rubin.

FAR (False Acceptance Ratio) Results for each user • As seen in Table, the FAR performance of the system except the users 4, 5, 9, and 14 be accepted among the reasonable limits. • Excepts these users, the average of FAR results is 10%. • When the keyword latencies of users 4, 5, 9, and 14 are analyzed, we see that these user’s keyword latencies has large standard deviations because of their typing behavior. • For all of 16 users, the average of FAR results is 26%.

Discussion • The kind of variations for the obtained results is normal since each user has different typing skills. • Each user has different typing patterns depending on the characteristics such as the speed of typing and the mood of the writer at the typing type and the work done.

Discussion • Figure shows the averages of average and 2 * standard deviations of keystroke latencies for 16 users. • The high FAR rate results from users who have slow typing speed and different typing behaviors. • As it can be seen from the Figure, some users like users 5 and 14 in our data set have large average and standard deviations. • These large averages and deviations form a wide band of keyword latencies that allow the access of imposter users with the approach used in this study. • It can be concluded that any method that uses similar methodology based on averages and standard deviations might be expected to produce the high FAR rate who has slow typing speed and different typing behaviors.

Conclusion • Biometric security systems based on keystroke dynamics can be considerably effective way to enhance the password based authentication when accessing a computer system. • The approach used the this study basically compares the latencies of passwords at each login and test if they fall between two standard deviations from the average reference latency.If all of the possible latencies passed this test then input for that password string would be considered valid. • The experimental results obtained in this study yield satisfactory FRR and FAR values for most of the users in our data set. • We tried to improve the FRR and FAR values by preprocessing methods such as outliner removal and normalization (min-max, z-score). Application of these preprocessing methods has no much effect on the improvement of performance of the system.

Conclusion • In keystroke recognition, there is no common keystroke dynamics data set that everyone can use and make comparative evaluation of the methodologies they use. • Currently, we are working an experiment to collect a new keystroke data set with a large number of users. We pan to make this data set publicly available from Internet. • As a future work, our plan is to implement different classification algorithms and methods on this data set, and make a comparative evaluation of them.

AIA 2007 THANKS http://www.akyokus.com/Presentations/

A IA 2007

A IA 2007

Presentation Transcript

Category Ia

ia blog

IA for Shopping October 16, 2007 Lauren Vo

Algebra IA (a) Front

İA regatear !

Unit IA

ia-kar

Tok IA

IA Brainstorm

Class IA

IA Encoding

IA-SCENARIOS AND IA-MODELS A CROSS-FERTILIZATION

IA-32

ABD Ia

PSYCHOLOGY IA

IA-64