Targeted Online Password Guessing: An Underestimated Threat

Targeted Online Password Guessing: An Underestimated Threat Ding Wang, Zijian Zhang, Ping Wang (Peking University,China) Jeff Yan (Lancaster University, UK) Xinyi Huang (Fujian Normal University, China) ACM CCS 2016

Password

Password authentication is ubiqutously used

The question we aim to answer Given some info about the victim, how to use the least attempts to online guess her password? Can be splited into 7 sub-questions 1. Given some demographic info about the victim, how to use the least attempts to online guess her password? 2. Given one password leaked from the victim at one account, how to use the least attempts to online guess her password at another account? ……

Outline • The problem • Explication of personal information and Security model • Understanding user behavior • Our approach • TarGuess: a unified attacking framework • Senven targeted cracking algorithms • Experimental results • Conclusion

口令的“魔咒” • 口令的“魔咒”可记忆 Vs. 抗猜测 • “可记忆”要求口令尽量短、有规律、不复杂 • “抗猜测”要求口令尽量长、无规律、越复杂越好 • 数以百计的替代方案 • 图形口令认证 • 生物认证 • 多因子认证

A comparison of alternative schemes 对比结果：没有一种认证方案可实现“文本”口令方案的所有优点，都顾此失彼。

Password is likely to keep its place • 在可预见未来，口令认证仍将是最主要的认证方式。

Password security • Security can only be achieved under some • attacker model. • There are two broad classes of attackers • against passwords [NIST SP800-118]. • Guessing attacker (relevant to password strength) • e.g., brute-force guessing, dictionary guessing • Capture attacker (irrelevant to password strength) • e.g., phishing, keylogging, sniffing, password • replacing, etc.

Password guessing attacker • She needs to guess the real password from a • set of candidate ones. • Classification • Four types • Targeted online guessing is becoming more • and more realistic.

Targeted online password guessing • Trawling online password guessing • The attacker generates a single list of guesses for • all users, and thus the attacker will not be effective. • Targeted online password guessing • The attacker generates a list of guesses for one • targete (user), but how effective this kind of • attacker will be is largely unknown.

Why targeted online password guessing attacks are realistic threats? • An inherent conflict: Online guessing vs. DoS • If the number of failed attempts allowed is • small, DoS will be serious; • If the number of failed attempts allowed is • increased, online guessing will be serious; “ …… the verifier SHALL effectively limit online attackers to 100 consecutive failed attempts on a single account in any 30 day period ……” [NISTSP800-63-2, NIST SP800-118] 本文研究结果显示，即使允许的失败猜测次数低至100/月，攻击者仍有远超此前预期的在线猜测成功率。 • Personal info is readily available.

Why there is little research on targeted online password guessing? • Subjective reasons • Lack of real-world passowrd data with personal info • Involve some recent advancements in the • inter-discipline knowledge (e.g., Statistics, NLP) • Objective reasons • It is a challenging problem to design targeted • online guessing algorithms. • The guess number allowed is small, e.g., <1000 • There are multiple dimensions of info can be • used by the attacker. How the attacker prioritizes his passwrd guesses?

It is difficult to prioritize the guesses per user • People’s password choices vary much • among each other. • Many people have their own password composition • strategies. [CHI’16, SOUPS’15] • Users’ personal info is highly heterogeneous. • Users employ a diversiﬁed set of transformation • rules to modify passwords for cross-site reuse. • Users’ transformation rules are often context- • dependent. Some PII (e.g., name, birthday and hobby), as shown can be directly used as password components, while others (e.g., gender and education) cannot.

Current perceptions about targeted online password guessing • Easy to launch • Personal info is easy to acquire; • Any one with access to newwork can launch. • Easy to be resisted by using current security • mechanisms like lockout, throttling. “ …… online guessing can be readily addressed by throttling the rate of login attempts permitted……” [NIST SP800-63-3, 2016] How to characterize targeted (online) guessing attackers?

这一问题涉及 NSA提出的信息安全领域 5个困难问题中的2个。 How to characterize targeted attackers? 2015：http://cps-vo.org/group/hotsos/cfp 2010：http://cps-vo.org/node/6056

Outline • The problem • Explication of personal information and security model • Understanding user behavior • Our approach • TarGuess: a unified attacking framework • Senven targeted cracking algorithms • Experimental results • Conclusion

Explication of personal information • Three inter-changeably used terms • Personal information (PI) • Personally identiﬁable information (PII) • Demographic information • Sometimes, their deﬁnitions vary greatly in • different situations, laws, regulations. • Generally, a user’s personal information is • “any information relating to” this user, and • thus PI is broader than PII.

Classiﬁcation of personal information in the case of password cracking • Personal information (PI) • Personally identiﬁable information (PII) • 1) Type-1 PII : explicit role, e.g, birthdday • 2) Type-2 PII : implicit role, e.g., gender • User identification credentials • e.g., sister passwords, PINs • Other kinds of personal data (not considered)

System architecture • We consider the most generic case C/S.

Security model • We assume that all the public info (e.g., • leaked password lists and site policies) • should be available to . • We define a series of attacking scenarios • based on varied types of users’ personal info • given to . • We consider 3 kinds of personal info • Type-1 PII, Type-2 PII, Sister password A total of 7 attacking scenarios

Security model (2) • We mainly consider the most typical 4 types • of attacking scenarios • With TarGuess-I~IV, all 7 targeted guessing • scenarios can be tackled.

Five Chinese datasets, Five English ones A total of 95.83 million Real-world password datasets

Three Chinese ones, One English We get 7 PII-associated password datasets by matching email with PW datasets. Real-world personal info datasets

Users love to choose popular passwords 90年代有人统计，人类最常用的口令是12345； 20年后，人类进步了一位：123456。

How popular and unpopular user-chosen passwords are? • Passwords follow the Zipf’s law, satisfying • the 20/50, or 20/80 rule. • 8.21% of users choose the top-100 passwords, • while there are 40% of users choose • passwords that occur only once.

Users love to reuse passwords —— Survey results 77%的用户重用（或修改）一个现有的口令。

Users love to reuse passwords —— Empirical evidence • We find passwords from the same user by matching email. • 34.02% ∼51.11% of Chinese users’ sister password pairs • are identical, while this figure for English users is • 6.25% ∼ 21.96%. • Among these non-identical password pairs, 70% are not • very similar. Most users modify passwords in a non-trivial way.

Users love to build passwords using their own type-1 PII • Popular Type-1 PII in passwords • name, birthday, email prefix, user name.

Type-2 PII also shows their impact • Gender and age show tangible impact.

TarGuess: A framework for targeted online password guessing • TarGuess is proposed to model various targeted online guessing scenarios • 3 phases: preparing, training and guessing

Our four primary formal models • TarGuess-I~IV • With TarGuess-I~IV, all 7 targeted guessing • scenarios can be tackled.

TarGuess-I: Public info+Type-1 PII • Based on probabilistic context-free grammars (PCFG) • Key idea: type-based PII matching/segment We suggest the idea for the first time.

上下文无关文法 • 形式定义:上下文无关文法是一个四元组,即 =( ，，， )： • 终结符集合； • 非终结符集合 (与不相交)； • 产生式或文法规则 A →β形成的集合 , 其中A∈ ， β∈(∪ )； • 开始符号 ∈ . • 上下文无关文法:Context-Free Grammars • 简称：CFG 文法的左部一定是非终结符。文法的右部可以是终结符也可以是非终结符。

概率上下文无关文法 • Probabilistic context-free grammars (PCFG) 与CFG相比，PCFG文法中每条规则 A →β都被赋予概率 P(A→β)∈[0,1],并且满足 ΣP(A→β)=1

PCFG-based password cracking model • Originally disigned to characterize trawling guessing attackers. [IEEE S&P’09, IEEE S&P’14] • Key idea: Parse passwords into the the letter (L)-, digital (D)- and symbol (S) segments, and learn the probabilities of basic structures, L-, D- and S- segments from real password datasets. E.g., password123 L8D3, and one can get P(password123)=P(L8D3)*P(L8password) * P(D3123)

PCFG-based password cracking model (2) P(love1314)=P(L4D4)*P(L4love)* P(D41314) =0.2*0.25*0.2 =0.01

PCFG-based password cracking model (3) • Suitable for trawling guessing • Essentially, it only employs the • user weakness in choosing popular • Passwords. • Do not take into account user • PII and password reuse. Unsuitable for targeted guessing

TarGuess-I: targeted PCFG • To capture PII semantics, besides the L, D, • S tags as with PCFG, we introduce a number of type-based PII tags: • 1) N for name; • 2) B for birthday; • 3) E for email prefix; • 4) A for user name; • 5) I for national ID number; • 6) P for phone number; • ……..

TarGuess-I: targeted PCFG (2) • PCFG：wang.123  L4S1D3 • TarGuess-I: wang.123 N3S1D3 • For each type-based PII tag, its subscript number stands for a particular sub-type of one kind of PII usages but not the length matched, as opposed to the L, D, S tags. • 1) N1∼N7: N1 for the usage of full name, N2 for the • abbr. of full name, N3 for family name ……. • 2) B1∼B10: B1 for birthday in YMD format, B2 for • birthday in YMD format, …… • 3) E1∼E3: • 4) A1∼A3: • 5) I1∼I2: • 6) P1∼P3:

TarGuess-I: targeted PCFG (3) • Training phase

TarGuess-I: targeted PCFG (4) • Guess generation phase 文法产生的语言。

Comparison with existing algorithms • A comparison of TarGuess-I (and its variants) with • Personal-PCFG [20], trained on the 50% of 12306 dataset and tested on the remaining 50%. • TarGuess-I and Personal-PCFG: six kinds of the 12306 type-1 PII; • TarGuess-I′ eliminates phone # and NID; • TarGuess-I′′ further • eliminates email • and user name; • 4) TarGuess-I′′′ further • eliminates birthday. TarGuess-I cracks 37.11%∼73.33% more passwords.

TarGuess-II: Public info+Sister PW • Key idea: password reuse behaviors are • context-dependent. • Training phase: given one password pair • (PWA, PWB) in training set,

TarGuess-II（2）

Comparing TarGuess-II with existing algorithms • Comparing TarGuess II∼ IV and Das et al.’s algorithm, • trained on the 66,573 non-identical PW pairs of • 126 → CSDN and tested on the 30,8045 non-identical • password pairs of Dodonew→CSDN. • Besides a sister password, TarGuess-III uses four types • of 51job type-1 PII and • TarGuess-IV further • uses the gender info. TarGuess-II outperforms Das et al.’s algorithm by 111.06%.

TarGuess-III: Sister password+ type-1 PII Insert {N 1∼N 7, B1∼B10, A1, A2 , A3 ; E1, E2 , E3 ; P1 , P2 ; I1 , I2 , I3} into V. • To solve this attacking scenario, we only need to introduce the type-based PII-tags into TarGuess-II. Probabilistic Context-Free Grammar

Targeted Online Password Guessing: An Underestimated Threat

Targeted Online Password Guessing: An Underestimated Threat

Presentation Transcript

OBESITY: AN EMERGING THREAT

Targeted Online Password Guessing: An Underestimated Threat

International migration of Moroccan women: An underestimated component

REVISITING DEFENSES AGAINST LARGE SCALE ONLINE PASSWORD GUESSING ATTACKS

Threshold password authentication against guessing attacks in Ad hoc networks

guessing guGU

Guessing Game

GUESSING GAME

Advanced Targeted Malware or Advanced Persistent Threat

Guessing game

REVISITING DEFENSES AGAINST LARGE SCALE ONLINE PASSWORD GUESSING ATTACKS

CompChall: Addressing Password Guessing Attacks IAS, ITCC-2005, April 2005

Guessing game

TARGETED ONLINE ADVERTISING

FREE GUESSING

Targeted Interventions: An Overview

Hepatitis B and Pregnancy An Underestimated Issue

Guessing game!

Guessing game

Satta Matka Guessing Online