1 / 37

August 8 – 11, 2013

ICFP PROGRAMMING CONTEST. Michal Moskal and Nikhil Swamy Research in Software Engineering ( riSE ) Microsoft Research, Redmond. August 8 – 11, 2013. Organize the Contest? Who, me?! No thanks!. That's a shame … because …. The contest is in Rude Health!.

oswald
Download Presentation

August 8 – 11, 2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICFP PROGRAMMING CONTEST Michal Moskal and Nikhil SwamyResearch in Software Engineering (riSE)Microsoft Research, Redmond August 8 – 11, 2013

  2. Organize the Contest? Who, me?! No thanks! That's a shame … because …

  3. The contest is in Rude Health! • More than 550 teams registered to participate • You have the undivided attention of more than 1000 expert programmers for 72 hours! (mostly) • Wow! 72k programmer hours! That's a really valuable resource! • Organize the contest? Hell yeah!

  4. What question do we want 1000 expert programmers to answer? • Traditionally: Which is the best programming language?

  5. WHICH is the best PROGRAMMING LANGUAGE? • Boring! • The answer is easy:

  6. WHICH is the best PROGRAMMING LANGUAGE? • The question is a bit bogus • It depends on the programmer • Expert programmers can use whatever and do well • Even ASM has placed well in past ICFPCs • It depends on the task • Winning team this year used 6 languages for different sub-tasks

  7. WHICH is the best PROGRAMMING LANGUAGE? • Let's not focus so much on this question …

  8. Question this year: • What's up with program synthesis?

  9. Can we calibrate research on program synthesis against what an army of crack programmers can do?

  10. Calibrating program synthesis • Synthesis of loop-free programs; Gulwani et al.; PLDI 2011 • Uses an SMT solver to synthesize bit-vector programs • Scales to 16 instructions in at most 45 minutes • Applications to super-optimization etc. • Big improvement over prior tools • Sketch (2006): Solar Lezama et al., scales to 8 instructions • AHA (2002): scales to 6-8 instructions

  11. Calibrating program synthesis • Synthesis of loop-free programs; Gulwani et al.; PLDI 2011 • Uses an SMT solver to synthesize bit-vector programs • Scales to 16 instructions in at most 45 minutes • 16 instructions is quite a lot! SMT solvers are cool! • Naïvely, search space = ~10^16 • But, is that it?

  12. August 8 – 11, 2013 • 300+ teams wrote tools to synthesize bit-vector programs • We evaluated these tools on a set of 1,800 benchmark problems • Our main goal: • How would the top-teams fare against the best SMT solutions? • A (not-so-)secret hope: • Some of the best teams would end up using SMT solvers

  13. The program synthesis game Ah ha! I guess A = λx. if x & 1 = 0 then x else x + 1 Ah. I bet A = λx. x+1 Hmm. Ok, so what is A(11) and A(12) then? Can you tell me what A(16), A(42), A(128) are? Let me check … Let me check … Yep! That's right! You score one point. A(16)=17, A(42)=43, A(128)=129. Nope. A(9)=9. I have a secret program A. Can you guess what it is? You have 5 minutes. Since you ask so nicely: A(11)=12 and A(12)=13 PLAYER GAME query.smt2 A ≈λx. x+1 ? query.smt2 A ≈λx. if x&1=0…? No! Counterexample: A(9) <> (λx.x+1) 9 Yes!

  14. Punch line The winning teams were amazing! • Main goal: Calibration • Winners were synthesizing programs 40 instructions long! • Our reference SMT-based solutions maxed out at 15-16 • Recall: the difficulty is exponential in the problem size • Secret-hope: SMT usage • Many top-10 teams tried SMT, but all opted for hand-tuned, brute force search, with lots of smart pruning heuristics Winning team parallelized the search and used 1000 hours of compute time on Amazon EC2

  15. 40 vs. 16! What's up with that? Elegant general-purpose formulations in terms of constraint solving: Relatively easy to code up and obtain decent results But, hand-tuned solutions are going to do better … MUCH BETTER If you really want to super-optimize something: Smart search for 1000 hours is cheap!

  16. 1. Need to decide equivalence effectively Ah ha! I guess A = λx. if x & 1 = 0 then x else x + 1 No dice! A(17)=18. Yep! That's right! You score one point. PLAYER GAME query.smt2 A ≈λx. e ? No! Counterexample: A(17) <> (λx.e) 17 Yes!

  17. \BV: Functions on 64-bit vectors • p ::= λx.e • e ::= 0 | 1 | x | op1 e | e op2 e • | if0 e then e else e | fold e eλx y.e • op1 ::= not | shl1 | shr1 | shr4 | shr16 • op2 ::= and | or | xor | plus • Z3 implements a decidable theory of bit-vectors • So, equivalence checking on \BV programs is decidable … • But, it's NP-hard and can be quite expensive

  18. 2. Need to SCALE to millions of requests Ah ha! I guess A = λx. if x & 1 = 0 then x else x + 1 No dice! A(17)=18. Yep! That's right! You score one point. PLAYER GAME query.smt2 A ≈λx. e ? No! Counterexample: A(17) <> (λx.e) 17 Yes!

  19. Elastic scaling on the Windows Azure Cloud We were set up to run Z3 on up to 128 cores on Azure

  20. Throttling requests • Each team was assigned an authorization token • Tokens were distributed in a pre-registration phase • (loud complaints about this!) • Token granted a team the ability to make 5 requests/20 seconds • Z3 given 20 seconds to decide equivalence, but typically completed in less than 5 seconds

  21. Peak: 40 requests/Second on 23 cores

  22. Z3 handled a Million requests • Z3 received approx. 1 million requests over the weekend • Successfully decided all except ~300 in less than 20 seconds (many in just milliseconds) • Timeouts did not contribute to score • But, scores were adjusted slightly after the end of the competition • No team's position changed

  23. 3. Need to generate ~100K problem instances I have a secret program A. Can you guess what it is? You have 5 minutes. PLAYER GAME

  24. 1400 randomly Generated problems assigned to each team • Categorized by size and whether or not the program contains fold • Totally: 70 categories • Low barrier to entry: 300 problems are really easy to solve • Increasing difficulty • With some cleverness, about 800 could be solved • Remaining 300 are super-hard (at least for us)

  25. 1400 randomly Generated problems assigned to each team • Categorized by size and whether or not the program contains fold • Totally: 70 categories Contestants needed to balance risk vs. reward • A large random program may be semantically equivalent to a small one But, also a bit noisy

  26. +400 bonus problems built from hard nuggets • Exactly the same 400 assigned to all teams • Aim to differentiate the best teams Randomly generate 1000s of nuggets {p1, …, pn} each of size 14 Use Z3 to prove that there exists no program of size 12 or less equivalent to any of the nuggets Build larger programs from nuggets: if0 pi then pj else pk

  27. What we used • Z3, F#, TypeScript, JavaScript, TouchDevelop, and Windows Azure are great tools for organizing a programming contest!

  28. winners

  29. Judges' Prize: kuma- Yusuke Endoh and Nayuko Watanabe are an extremely cool bunch of hackers! Wewere particularly impressed by your compact and elegant Ruby code and are surprised that a scripting language could perform well enough to be competitive at this computationally intensive task. That's great validation for the new generational GC produced by you and other Ruby implementers. Congratulations! RGenGC was developed by Koichi Sasada Awarded $250

  30. Lightning division winner: ITF • C++ is very suitable for rapid prototyping. • KojiroIzuka, Hiroshi Maeda, RyosukeKayanagi • University of Tsukuba, Japan • Awarded $250

  31. 3rd place: Hack the Loop C#, C++, bash, awk, sed, and Excel are not too shabby Pavel Egorov, Andrew Kostousov, Alexey Mogilnikov, Sergey Azovskov, Alexey Buslavyev, KseniyaZhagorina, Denis Dublennyh, EugenyKlyukin, Maxim Sannikov, VladislavIsenbaev SKB Kontur, QRGL, Facebook Russian Federation Awarded $250 DECLINED! Our team decided not to claim our prize. We would be glad if our prize will go to the needs of orphans, homeless children, functional programmers in need or other type of charity.

  32. 2nd place: F5 Attackers C++ and Python are fine programming tools for many applications Noriyuki Futatsugi, Takashi Nakamura Tai Fukuzawa, Nobuaki Tanaka, TakaakiHiragushi Fixstars Corporation and University of Tsukuba Japan Awarded $500

  33. Winner: Unagi—The Synthesis Java, C#, C++, PHP, Ruby, and Haskell are programming tools of choice for discriminating hackers Takuya Akiba, Yoichi Iwata, KentaroImajo, ToshikiKataoka, Naohiro Takahashi, Hiroaki Iwami University of Tokyo, Google, Keio University and AtCoder Japan Awarded $1000 Thanks to SIGPLAN, John Tristan and Greg Morrisett for managing all the issues related to prizes

  34. unagi's solution: Score 1696/1800Brute force + Pruning + Multiple Strategies in Parallel running in the Ec2 cloud • ~(~x)=x • ~(if0 x (~y) z) = if0 x y ~z • ((x<<1)>>1)<<1 = x<<1 • ((x>>1)<<1)>>1=x>>1 • (x>>4)>>1=(x>>1)>>4 • y>>16=0   (where y is a left variable of fold) • y&x=x&y • x&x=x • x&~0=x • x&0=0 • (y&(x&z))=x&(y&z) • ~x&x=0 • 1&(x<<1)=0 • x^~y=~(x^y) • if0 constant x y = x (or y) • if0 x y y = y • if0 x 0 x = x • if0 x x y = if0 x 0 y

  35. We aren't quite done with this yet • Lots of data to analyze • Many different strategies employed, but many similar ones too • Can we reverse engineer/categorize strategies from logs • Many other program synthesizers around (including several in RiSE) • Tune them up and run them against this problem set

  36. Looking ahead • 72K PROGRAMMER-HOURS IS A VALUABLE RESOURCE • LET'S MAKE GOOD USE OF IT! • WHAT QUESTIONS COULD WE ASK IN THE FUTURE? • CROWD-SOURCED PROGRAM DEVELOPMENT/BUG-FINDING? • INVARIANT DISCOVERY? • SEARCHING FOR INTERPOLANTS? • …?

More Related