1 / 25

Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis

Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis. Yungbum Jung, Jaehwang Kim, Jaeho Shin , Kwangkeun Yi Programming Research Lab. Seoul National University. Motivation : an Industry’s Challenge.

noreen
Download Presentation

Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taming False Alarms from a Domain-Unaware C Analyzerby a Statistical Post Analysis Yungbum Jung, Jaehwang Kim,Jaeho Shin, Kwangkeun Yi Programming Research Lab.Seoul National University

  2. Motivation: an Industry’s Challenge • In 2004, a company’s SQA dept. asked us for a C buffer-overrun static analyzer that • must besound • must have a reasonable cost • must be domain-unaware • Our path • Sound analyzer: drive cost-accuracy balance to a limit • Statistical filter: sift out inevitable false alarms and rank alarms by their true probabilities

  3. Outline • Airac, Our Analyzer • Internals • Performance • Statistical Analysis • Symptoms • Models • Bayesian Analysis • Linear Logistic Regression • Sifting out, Ranking

  4. Airac • Array Index Range Analyzer for C • Our static analyzer • Is an abstract interpreter • Does numerical interval analysis • Is sound • in sense of detecting all possible buffer overruns • Covers full ANSI C + some GNU extensions

  5. α Abstraction Set of concrete machine transition traces Map from program points to abstract states PgmPt  State • Usual abstraction for stateful programs

  6. Abstract Domains • Machine = State x PgmPt • State = Stk x Mem x Dmp • Mem = Addr  Val • Val = Interval x 2Addr x 2Array • Addr = PgmVar + AllocSite + AllocSite x Field • Array = AllocSite x Base x Size • AllocSite = PgmPt • [a, b] ∈ Interval = Base = Size ...

  7. Techniques Used • Accuracy improvement by • narrowing after widening • flow-sensitivity • context pruning (limited to linear expressions) • static inlining (parameterized) • static loop unrolling (parameterized) • Cost reduction by • careful worklist order: lazy at join points • selective join/compare • stack obviation

  8. Stack Obviation • Size of Stk proportional to program size • Most of the analysis time = join + compare • OK to skip join/compare for Stk • if changes of Stk always reflected on Mem • By simple syntactic transformation • e1 ? e2 : e3  { if (e1) t = e2 else t = e3; t } • e[f()]  t = f(); e[t] • 3~5 times speed up

  9. Error Recovery During Analysis 1: int a[10], i, j; 2: for (i=0;i<10;i++) { 3: a[i] =2 * i; 4: } 5: j = a[i]; 6: a[i] = … … buffer overrunsince i [10, 10] Optimistic Assumption: i[0, 9]j[0, 18]

  10. Warnings about Performance • Assume typeful C programs • arrays must be used as the same type declared • Artificial semantics after errors • e.g. overrun, null dereference • No side-effect for library functions • No main() then • analyze procedures in their defined order • No alarms about buffers whose size is top • Top value for free variables

  11. Performance 1/2 Performed on a Linux 2.6 box with Pentium4 3.2GHz, 4GB RAM

  12. Performance 2/2

  13. Statistical Post Analysis • We collect • Samples of true and false alarm • Symptoms of each alarm • From them, compute trueness of alarms • i.e. probability being true given its symptoms • With trueness we can • Sift out false alarms • Report truer alarms first

  14. g h f Symptoms • Syntactic symptoms • AfterLoop, AfterBranch, AfterReturn, InNestedLoopBody, InNestedBranchBody • InLoopCond, InBranchCond, InFunParam, InNestedFunParam, InRightOfAnd • Semantic symptoms • JoinN, NotNarrowed, ComplexData, InCyclicCallChain • Prunning, PassedValue, ConstantVariable, ConstantIndex, ConstantArrayConstantIndex • Result symptoms • TopIndex, HalfInfiniteIndex • FiniteOffsetFiniteArray, FiniteIndex • Common-sense + shallow inside info [9, 10]

  15. Bayesian Analysis • For each alarm, we compute its conditional probability being true given its symptoms • Numbers from “learning samples” • Estimated using Monte-Carlo method We assume symptoms occur independently (naïve Bayesian filtering)

  16. Sifting Out Threshold • User’s knob: his/her risk ratio (Rs/Rr) • Minimize risk expectation • Risk expectation of an alarm with probability p when • Silencing = Rs x p • Reporting = Rr x (1 – p) • We silence if Rs x p < Rr x (1 – p) • Hence, sift out when p < Rr/ (Rr + Rs) = 1 / (1 +Rs/Rr)

  17. Experiments • With alarms from • Parts of the Linux kernel • Programs in algorithm text-books • Learning and testing • 50%/50% randomly chosen • 15 times repeated

  18. Sifting Out Alarms • Rs = 3 x Rr threshold = 0.25 • 74.84% of false alarms filtered out :-) • 31.40% of true alarms were also swept out :-(

  19. Ranking Alarms • Show user “truer” alarms first • 15.17%of false alarms are mixed upuntil the user sees 50% of the true alarms

  20. Binary Logistic Regression • Trueness of an alarm given its binary symptom vector • Generalized linear model • Coefficients from learning set • For example,

  21. Bayesian vs. Logistic Regression 1/2 • With threshold 0.25, • Bayesian: 74.84% of false, 31.40% of true • Logistic Regression: 90.05% of false, 20.85% of true alarms can be sifted out

  22. Bayesian vs. Logistic Regression 2/2 • Until user sees 50%of true alarms • Bayesian: 15.17% • Logistic Regression: 4.10% of false alarms were mixed up Conjecture:Logistic regression model respects symptom dependency?

  23. unsound requireannotation domain-aware Related Work • Buffer overrun detection • ARCHER [Xie, Chou & Engler 2003] • SPLINT [Zitser, Lippmann & Leek 2004] • CSSV [Dor, Rodeh & Sagiv 2003] • ASTRÉE [Cousot et al. 2005, 2003] • Statistical approach • Z-ranking [Kremenek & Engler 2003] • Error Correlation [Kremenek et al. 2004]

  24. Conclusion • Our “sound” static analyzer,Airac is realistic • False alarms are inevitablein domain-unaware situation • Statistical approaches helped • viable approach to handle false alarms • natural symptoms seem to work • orthogonal to other static analysis techniques • generic, depends on learning set

  25. Thank you • Questions? • Demo available at • http://ropas.snu.ac.kr/airac

More Related