1 / 30

Sashank Narain Amirali Sanatinia Guevara Noubir College of Computer and Information Science

Single-stroke Language- Agnostic Keylogging using Stereo-Microphones and Domain Specific Machine Learning. Sashank Narain Amirali Sanatinia Guevara Noubir College of Computer and Information Science Northeastern University. Motivation. Side channel attacks escape the security model

zahi
Download Presentation

Sashank Narain Amirali Sanatinia Guevara Noubir College of Computer and Information Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Single-stroke Language-Agnostic Keyloggingusing Stereo-Microphones and Domain Specific Machine Learning SashankNarainAmiraliSanatinia Guevara Noubir College of Computer and Information Science Northeastern University

  2. Motivation • Side channel attacks escape the security model • Academically pioneered by Paul Kocher’1996 • Timing, power analysis, sound • Global proliferation of mobile smartphones • Estimated 1.75 billion smartphones in 2014 • Used for many day-to-day and business operations • Trusted for sensitive information • Personally Identifiable Information (PII) • Credit Card numbers, Passwords, Location information • Easy target of direct and indirect privacy breaches

  3. Outline • Problem & General Attack Scenario • Android Sensors for Keystroke Inference • Related Attacks • Challenges in Keystroke Inference • Our Approach • Using Signal Processing, Designing a Meta-Algorithm • Evaluation Results • Mitigation Techniques

  4. The Problem • Sensors on smartphones bypass security mechanisms • Accelerometer, Compass, Gyroscope • Not sandboxed • Do not require explicit permissions • Indirectly leak sensitive information • GPS, Camera & Microphones • Require coarse explicit permissions but contain generic descriptions • Users may ignore permissions • Directly leak sensitive information • Can be accessed at anytime

  5. Attack Scenario • Adversary lures victim to install Trojan app • e.g., ‘To-do’ app that supports speech recognition • App records sensor data when user types in Trojan app • Builds training models from collected data • On the phone / On a central server • App invokes service that waits for sensitive activity to start • e.g., Your Favorite Bank Login Page • App records sensor data when sensitive activity • Generates predictions from sensitive data using training models

  6. Motion Sensors in Android • Easy to build apps using these APIs • Java methods in Sensor class of Android SDK • C++ functions in sensor.hheader of Android NDK • Fixed three dimensional co-ordinate system • Relative to device • Sensitive to minute motion such as keystrokes Android Co-ordinate System

  7. Accelerometer • Measures Linear Acceleration + Gravity • Defined as TYPE_ACCELEROMETER • Or obtain sensor fusiondata measuring Linear Acceleration • Defined as TYPE_LINEAR_ACCELERATION • Extremely sensitive to motion and very noisy • High-pass filter removes gravity • Low-pass filter removes noise • Used for initial experiments, discarded later on • Gyroscope more stable for Keystroke Inference

  8. Gyroscope • Measures rate of rotation in radians / sec • Defined as TYPE_GYROSCOPE • Good for inference • Sensitive to motionbut not very noisy • Similar pattern for same keys and different for other keys on x/y axes Similarity between two taps of Character ‘Q’ and two taps of Character ‘V’

  9. Gyroscope (cont.) • To compute rotation: Inc. Angle of Rotation ≈ Rate of Rotation * Sampling Time (dT) • Challenge: Gyroscope Bias & Bias Drift requires correction

  10. Stereo-Microphones • Microphone arrays commonplace in modern smartphones • Used for audio enhancements e.g., noise suppression HTC One series support stereo-recording • Ideal for inference • Keystrokes on a soft keyboard can be recorded by microphones • Different amplitudes and time delay for unique keystrokes • Fixed time delay at two microphones for same keys (8 samples for ‘Q’, 15 for ‘V’) Sound waves for Character ‘Q’ and ‘V’ taps

  11. Stereo-Microphones (cont.) • Delay in tap detection between two microphones (M1, M2) Number of Samples = (Distance(T, M1) – Distance(T, M2)) * Sampling Rate / Speed of Sound • For the HTC One • Distance between microphones: 0.134 m • Maximum supported sampling rate: 48 KHz • Speed of sound in air: 340 m / s • Difference of +19 samples to -19 samples • For future devices with higher sampling rate • Example sampling rate: 192 Khz • Difference of 2*75 samples for tap close to one microphone

  12. Related Work (Attacks) • First work by Cai & Chen 2011 • Demonstrated feasibility of inference using the Orientation sensor • Developed Android application called ‘TouchLogger’ • Accuracy tested on Number only keypad in Landscapemode • Successful inference accuracy of 70% on 3 data-sets

  13. Related Work (cont.) • Owusu et al. 2012 • QWERTY in Landscapemode, Area Inference • Developed Android app called ‘ACCessory’ • Data-sets on HTC ADR 6300 phone from 4 users • Successfully inferred 6 character passwords • 6 passwords out of 99 in 4.5 trials • Estimated 59 passwords out of 99 in 215 trials • Xu, Bai & Zhu 2012 • Lock screen password and numbers during call • E.g., Credit Card and PIN numbers • Used two sensors, Accelerometer for tap detection & Orientation for inference • Developed Android app called ‘TapLogger’ • Data-sets on HTC Aria and Google Nexus (One) phones from 3 users • Achieved: 50% for 1 guess and high accuracy for top 3

  14. Related Work (cont.) • Aviv et al. 2012 • PIN numbers and pattern passwords inference • Used the Accelerometer sensor for inference • Data-sets on Nexus One, G2, Nexus S and Droid Incredible from 24 users in two settings • Controlled (Seated) and Uncontrolled (Walking) • Accuracy of 43% and 73% on PIN and pattern passwords respectively, within 5 attempts

  15. Related Work (cont.) • Miluzzo et al. 2012 • QWERTY in Landscape mode and Icon in Portrait mode inference • Used Accelerometer and Gyroscope sensor combined with Ensemble learning • Presented a framework called ‘TapPrints’ • Datasets on Google Nexus S, Samsung Galaxy Tab 10.1, iPhone 4 • Icon locations inferred with 79% and 65%accuracy for the iPhone and Google Nexus S, resp. • Characters inferred with 65% accuracy • Some icons or characters inferred with accuracy of up to 90% and 80%, respectively

  16. Challenges • Gyroscope • Noise • Typing with trembling hands • Typing in different environments e.g., inside a car • Soft Touch • User taps too soft to induce vibrations • Gyroscope Drift and Bias • Stereo-Microphones • Noise • Typing in an environment with lot of background noise • Typing in different environments with different noise levels • Soft Touch • User tap sounds don’t reach microphones

  17. Our Approach • Use a combination of sensors • Accelerometer (initially) + Gyroscope + Stereo-Microphones • Use signal processing and richer data instead of features • Complementary filter combining Accelerometer and Gyroscope and bandpassfilter to remove Gyroscope drift and noise • Bandpass filter [1.5 - 3.5 KHz] to reduce audio noise Microphones Filtering Gyroscope Filtering

  18. Our Approach (cont.) • Use a specialized multi-level Meta-Algorithm • Use several machine learning algorithms and combine results • Create training models for individual characters • Create training models for specific keyboard areas • Make predictions on areas, then on individual keys in area Area Division

  19. Elementary Algorithms • Machine learning algorithms • Supervised classification • Selected: Decision Trees (DT), Naïve Bayes (NB), k-Nearest Neighbor (k-NN) • Not selected: Hidden Markov Models, Support Vector Machines, Random Forest, Neural Networks

  20. The Meta-Algorithm Area Selection  Individual Models  Area Models  Voting Models 

  21. Comparison to Previous Work • Use stereo-microphones for keystroke inference • Combine sensor and acoustics for keystroke inference • Use of richer processed sensor and audio data instead of extracting features • Use a multi-layer multi-algorithm approach based on the specifics of Android keyboard • Addresses smaller keyboard dimensions e.g., standard QWERTY keyboard exceeding 90% prediction accuracy • Demonstrating end to end attack feasibility

  22. Evaluation System • Hardware • HTC One (Android 4.4) , Samsung S2 & Tab 8 (Android 4.1) • No modifications to OS • Evaluation Application • Collects datasets for training and evaluation • Custom keyboard for training with same layout as standard keyboard • QWERTY & Numerical; Portrait & Landscape • Datasets • 7 participants • 5 in office; 2 in restaurant (-2 unusable)

  23. Evaluation Metrics • Performance of Meta-Algorithm • Of different sensors for different areas • As compared to elementary use of algorithms • End-to-end Attack • For sensor data collected by Trojan app from sensitive apps

  24. Evaluation(Meta-Algorithm) • Gyroscope results location dependent • Areas further from gyroscope result in more rotation – Easy to Infer • Microphones results typically location independent • Infer mostly based on speed of sound • The two could be combined to boost inference accuracy • When both data are not noisy E.g., HTC One QWERTY

  25. Evaluation (cont.)(Meta-Algorithm) • Substantialincrease in accuracy in comparison to elementary use of algorithms • Accuracy of samples using • elementary algorithms Accuracy of samples using  Meta-algorithm

  26. Evaluation (cont.)(Meta-Algorithm) • Possible to achieve > 90% for QWERTY keyboard • Possible to achieve > 95% for Number keyboard • Some sample sets between 44-56% • Noise > 70dB • Gyroscope Drift • Soft Touch sets < 20%

  27. Evaluation(End-to-End Attack) • Collected on banking app with fake numbers • Every UI page is known as an activity • Trojan queries for the foreground activity every 5s • 100 four digit PIN numbers • 376 out of 400 digits predicted correct (94%) • 84 predicted completely correct • 100 sixteen digits Credit Card numbers • 1467 out of 1600 digit predicted correct (91.5%) • 52 predicted completely correct

  28. Mitigation Techniques • Sensors bypass Android security model (Sandboxing and Permissions) • Gyroscope sensor • Is not sandboxed • Does not require explicit permissions • Microphones • Requires explicit permissions but contain generic descriptions • No dynamic control • One Technique: Blocking • Obtain lock on mutually exclusive sensors and hardware • Invoke the Microphones or Camera to deny access to other apps • FlaskDroid [Bugiel et al. 2013]

  29. Mitigation Techniques (cont.) • Alternative Technique: Limiting Access • Blocking ineffective against Gyroscope sensor • They are not-mutually exclusive • Observation: sampling rate affects Inference capability • Solution: Reduce the sampling rate for background apps to a low but acceptable level

  30. Conclusions • Stereo-microphones + gyroscope keyloging predictions can exceed 90% accuracy • Implications of mobile phone sensors on privacy still not well understood • Need for better privacy models in devices loaded with side channels • Mitigations at all layers of the stack

More Related