1 / 40

On the Semantic Patterns of Passwords and their Security Impact

On the Semantic Patterns of Passwords and their Security Impact. Rafael Veras , Christopher Collins, Julie Thorpe University of Ontario institute of Technology Presenter: Kyle Wallace. A Familiar Scenario…. User Name:. CoolGuy90. Password:. “ What should I pick as my new password ?”.

dorjan
Download Presentation

On the Semantic Patterns of Passwords and their Security Impact

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Semantic Patterns of Passwords and their Security Impact Rafael Veras, Christopher Collins, Julie Thorpe University of Ontario institute of Technology Presenter: Kyle Wallace

  2. A Familiar Scenario… User Name: CoolGuy90 Password: “What should I pick as my new password?”

  3. A Familiar Scenario… “Musical!Snowycat90”

  4. A Familiar Scenario… • But how secure is “Musical!Snowycat90” really? (18 chars) • “Musical” – Dictionary word, possibly related to hobby • “!” – Filler character • “Snowy” – Dictionary word, attribute to “cat” • “cat” – Dictionary word, animal, possibly pet • “90” – Number, possibly truncated year of birth • 15/18 characters are related to dictionary words! Why do we pick the passwords that we do?

  5. Password Patterns? • “Even after half a century of password use in computing, we still do not have a deep understanding of how people create their passwords” –Authors • Are there ‘meta-patterns’ or preferences that can be observed across how people choose their passwords? • Do these patterns/preferences have an impact on security?

  6. Contributions • Use NLP to segment, classify, and generalize semantic categories • Describe most common semantic patterns in RockYou database • A PCFG that captures structural, semantic, and syntactic patterns • Evaluation of security impact, comparison with previous studies

  7. Contributions • Use NLP to segment, classify, and generalize semantic categories • Describe most common semantic patterns in RockYou database • A PCFG that captures structural, semantic, and syntactic patterns • Evaluation of security impact, comparison with previous studies

  8. Segmentation • Decomposition of passwords into constituent parts • Passwords contain no whitespace characters (usually) • Passwords contain filler characters (“gaps”) between segments • Ex: crazy2duck93^ -> {crazy, duck} & {2,93^} • Issue: What about strings that parse multiple ways?

  9. Coverage • Prefer fewer, smaller gaps to larger ones • Ex: Anyonebarks98 (13 characters long)

  10. Splitting Algorithm • Source corpora: Raw word list • Taken from COCA (Contemporary Corpus of American English) • Trimmed version of COCA: • 3 letter words: Frequency of 100+ • 2 letter words: Top 37 • 1 letter words: a, I • Also collected list of names, cities, surnames, months, and countries

  11. Splitting Algorithm • Reference Corpus: Collection of N-Grams, where N=3 (Full COCA) • N-Gram: Sequence of tokens (words) • Ex: “I love my cats” • Unigrams: I, love, my, cats (4) • Bigrams: I love, love my, my cats (3) • Trigrams: I love my, love my cats (2)

  12. Common Words

  13. Part-of-Speech Tagging • Necessary step for semantic classification • Ex: “love” is a noun (my true love) and a verb (I love cats) • Given segments , returns • Gap segments are not tagged

  14. Semantic Classification • Assigns a semantic classifier to each password segment • Only assigned to nouns and verbs • WordNet: A graph of concepts expressed as a set of synonyms • “Synsets” are arranged into hierarchies, more general at top • Fall back to source corpora for proper nouns • Tag with female name, male name, surname, country, or city

  15. Semantic Classification Tags represented as word.pos.#, where # is the WordNet ‘sense’

  16. Semantic Generalization • Where in the synset hierarchy should we represent a word? • Utilize a tree cut model on synset tree • Goal: Optimize between parameter & data description length

  17. W=1000 (gold), W=5000 (red), W=10000(blue)

  18. Contributions • Use NLP to segment, classify, and generalize semantic categories • Describe most common semantic patterns in RockYou database • A PCFG that captures structural, semantic, and syntactic patterns • Evaluation of security impact, comparison with previous studies

  19. Classification • RockYou leak (2009) contained over 32 million passwords • Effect of generalization can be seen in a few cases (in blue) • Some generalizations better than others (Ex: ‘looted’ vs ‘bravo100’) • Some synsets are not generalized (in red) • Ex: puppy.n.01 -> puppy.n.01

  20. Summary of Categories • Love (6,7) • Places (3, 13) • Sexual Terms (29, 34, 54, 69) • Royalty (25, 59, 60) • Profanity (40, 70, 72) • Animals (33, 36, 37, 92, 96 100) • Food (61, 66, 76, 82, 93) • Alcohol (39) • Money (46, 74) • *Some categories expanded from two letter acronyms • +Some categories contain noise from names dictionary

  21. Top 100 Semantic Categories

  22. Contributions • Use NLP to segment, classify, and generalize semantic categories • Describe most common semantic patterns in RockYou database • A PCFG that captures structural, semantic, and syntactic patterns • Evaluation of security impact, comparison with previous studies

  23. Probabilistic Context-Free Grammar • A CFG whose productions have associated probabilities • A vocabulary set (terminals) • A variable set (non-terminals) • A start variable • A set of rules (terminals + non-terminals) • A set of probabilities on rules, such that

  24. Semantic PCFG • In the author’s PCFG: • is comprised of the source corpora and learned gap segments • is the set of all semantic and syntactic categories • All rules are of the form , or (nonterminals) • This grammar is regular (described by a finite automaton)

  25. Sample PCFG • Training data: • iloveyou2 • ihatedthem3 • football3 • rules are base structures • Grammar can generate passwords • Probability of a password is the product of all rule probabilities • Ex: P(youlovethem2) = 0.0103125

  26. RockYou Base Structures (Top 50)

  27. Contributions • Use NLP to segment, classify, and generalize semantic categories • Describe most common semantic patterns in RockYou database • A PCFG that captures structural, semantic, and syntactic patterns • Evaluation of security impact, comparison with previous studies

  28. Building a Guess Generator • Cracking attacks consist of three steps: • Generate a guess • Hash the guess using the same algorithm as target • Check for matches in the target database • Most popular methods (using John the Ripper program) • Word lists (from previous breaks) • Brute force (usually after exhausting word lists)

  29. Guess Generator • At a high level: • Output terminals in highest probability order • Iteratively replaces higher probability terminals with lower probability ones • Uses priority queue to maintain order • Will this produce the same list of guesses every time?

  30. Guess Generator Example • Suppose only one base structure: • Initialized with most probable terminals: “I love Susie’s cat” • Pop first guess off queue (“IloveSusiescat”) • Replace first segment: “youloveSusiescat” • Replace second segment: “IhateSusiescat” • Replace third segment: “IloveBobscat” • Replace fourth segment: “IloveSusiesdog”

  31. Mangling Rules • Passwords aren’t always strictly lowercase • Beardog123lol • bearDOG123LoL • BearDog123LoL • Three types of rules: • Capitalize first word segment • Capitalize whole word segment • CamelCase on all segments • Any others?

  32. Comparison to Weir Approach • Author’s approach seen as an evolution of Weir • Weir contains far fewer non-terminals (less precise estimates) • Weir does not learn semantic rules (fewer overall terminals) • Weir treats grammar and dictionary input separately • Authors semantic classification needs to be re-run for changes

  33. Password Cracking Experiments • Considered 5 methods: • Semantic approach w/o mangling rules • Semantic approach w/ custom mangling rules • Semantic approach w/ JtR’s mangling rules • Weir approach • Wordlist w/ JtR’s default rules + incremental brute force • Attempted to crack LinkedIn and MySpace leaks

  34. Experiment 1: RockYou vs LinkedIn • 5,787,239 unique passwords • Results: • Semantic outperforms non-semantic versions • Weir approach is worst (67% improvement) • Authors approach is more robust against differing demographics

  35. Experiment 2: RockYou vs MySpace • 41,543 unique passwords • Results: • Semantic approach outperforms all • No-rules performs best • Weir approach is worst (32% improvement) • Password were phished, quality lowered?

  36. Experiment 3: Maximum Crack Rate • Since method is based on grammar, can build grammar recognizer to check • Results: • Semantic equivalent to brute force, with fewer guesses • Weir approach generates fewer guesses, 30% less guessed

  37. Experiment 3: Time to Maximum Crack • Fit non-linear regression to sample of guess probs. • Results: • Semantic method has lower guess/second • Grammar is much larger than Weir method

  38. Issues with Semantic Approach • Further study needed into performance bottlenecks • Though semantic method is more efficient(high guesses/hit) • Approach requires a significant amount of memory • Workaround involves probability threshold for adding to queue • Duplicates could be produced due to ambiguous splits • Ex: (one, go) vs (on, ego)

  39. Conclusions • There are underlying semantic patterns in password creation • These semantics can be captured in a probabilistic grammar • This grammar can be used to efficiently generate probable passwords • This generator shows (up to) a 67% improvement over previous efforts

  40. Thank you! Questions?

More Related