1 / 45

SAT and CSP competitions & benchmark libraries: some lessons learnt?

SAT and CSP competitions & benchmark libraries: some lessons learnt?. Toby Walsh NICTA & UNSW Sydney, Australia. Whats the best way to benchmark systems?. Outline . Benchmark libraries Founding CSPLib.org Competitions SAT competition judge TPTP competition judge …. Why?.

makani
Download Presentation

SAT and CSP competitions & benchmark libraries: some lessons learnt?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAT and CSP competitions & benchmark libraries:some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

  2. Whats the best way to benchmark systems?

  3. Outline • Benchmark libraries • Founding CSPLib.org • Competitions • SAT competition judge • TPTP competition judge • …

  4. Why? • Why did I set up CSPLib.org • I needed problems against which to benchmark my latest inference techniques • Zebra and random problems don’t cut it! • I thought it would help unify and advance the CP community

  5. Random problems • +ve • Easy to generate • Hard (if chosen from phase transition) • Impossible to cheat • You can solve 1000 variable random 3SAT problems at l/n=4.2, I’ll be impressed

  6. Random problems • -ve • Lack structures found in real world • Unrepresentative • E.g. random 3SAT either have many solutions or none • Different methods work well on them • Random SAT: forward looking algorithms • Industrial SAT: backward looking algorithms

  7. Why? • Thesis: every mature field has a benchmark library • Deduction started in 1960s • TPTP set up in 1993 • SAT started in 1960s • SAT DIMACS challenge in 1992 • SATLib set up in 1999 • CP started in 1970s • CSPLib set up in 1998

  8. Why? • Thesis: every mature field has a benchmark library • Spatial and temporal reasoning started in early 80s (or before?) • It’s been approximately 30 years so it’s about time you guys set one up!

  9. Benchmark libraries • CSPLib.org • Over 35k unique visitors • Still not everything I’d want it to be • But state of the art for experimentation is now much better than it was • I haven’t seen a zebra for a very long time

  10. An ideal library • Desiderata taken from: • CSPLib: a benchmark library for constraints, Proc. CP-99

  11. An ideal library • Location • On the web and easy to find • TPTP.org • CSPLib.org • SATLib.org • QBFLib.org • … • http://elib.zib.de/pub/mp-testdata/tsp/tsplib/tsplib.html • http://mat.gsia.cmu.edu/COLOR/instances.html

  12. An ideal library • Easy to use • Tools to make benchmarking as painless as possible • tptp2X, … • Diverse • To help prevent over-fitting

  13. An ideal library • Large • Growing continuously • Again helps to prevent over-fitting • Extensible • To new problems or domains

  14. An ideal library • Complete • One stop for your problems • Topical • For instance, it should report current best solutions found

  15. An ideal library • Independent • Not tied to a particular solver or proprietary input language • Mix of difficulties • Hard and easy problems • Solved and open problems • With perhaps even a difficulty index?

  16. An ideal library • Accurate • It should be trusted • Used • A valued resource for the community

  17. Problem format • Lo-tech or hi-tech?

  18. Lo-tech formats • DIMACS format used in SATLib c a simple example p cnf 3 2 1 -1 0 1 2 3 0 This represents: x v -x, x or y or z

  19. Lo-tech formats • DIMACS format used in SATLib • +ve • All programming languages can read integers! • Small amount of extensibility built in (e.g. QBF) • -ve • Larger extensions are problematic (e.g. beyond CNF to arbitrary Boolean circuits)

  20. Hi-tech formats • CP competition <instance> <presentation name="4-queens" description="This problem involves placing 4 queens on a chessboard" nbSolutions="at least 1" format="XCSP1.1 (XML CSP Representation 1.1)" /> <domains nbDomains="1"> <domain name="dom0" nbValues="4" values="1..4" /> </domains> <variables nbVariables="4"> <variable name="X0" domain="dom0"/> … </variables> <relations nbRelations="3"> <relation name="rel0" domain="dom0 dom0” nbConflicts="10 conflicts="(1,1)(1,2)(2,1)(2,2)(2,3)(3,2)(3,3)(3,4)(4,3)(4,4)" /> … </relations > <constraints nbConstraints="6"> <constraint name="C0" scope="X0 X1" relation="rel0"/> …

  21. Hi-tech formats • XML • +ve • Easy to extend • Parsing tools can be provided • -ve • Complex and verbose • Computers can parse terse structures easily

  22. No-tech formats • CSPLib • Problems are specified in natural language • No agreement at that time for an input language • One focus was on how you model a problem • Today there is more consensus on modelling languages like Zinc

  23. No-tech formats • CSPLib • Problems are specified in natural language • But you can still provide in one place • Input data • Results • Code • Parsers …

  24. Getting problems • Submit them yourself • Initially, you must do this so library has some critical mass first time people look at it • But it becomes tiresome and unrepresentative to do so continually • Ask at every talk • Tried for several years but it (almost) never worked

  25. Getting problems • Need some incentive • Offer money? • Price of entry for the competition? • If you have a competition, users will submit problems that their solver is good at?

  26. Competitions

  27. Libraries + Competitions • You can have a library without a competition • But you can’t have a competition without a library

  28. Libraries + Competitions • Libraries then competition • TPTP then CASC • Easy and safe! • Libraries and competition • Planning • RoboCup • …

  29. Increasing complexity • Constraints • 1st year, binary extensional • 2nd year, limited number of globals • 3rd year, unlimited • Planning • Increasing complexity • Time, metrics, uncertainty, …

  30. Benefits • Gets ideas implemented • Rewards engineering • Progress needs both science and engineering! • Puts it all together

  31. Benefits • Gives greater importance to important low-level issues • In SAT: • Watched literals • VSIDS • …

  32. Benefits • Witness the progress in SAT • 1985, 10s vars • 1995, 100s vars • 2005, 1000s vars • … • Not just Moore’s law at play!

  33. Pitfalls • Competitions require lots of work • Organizers get limited (academic) reward • One solution is to organize also competition special issues

  34. Pitfalls • Competitions encourage incremental improvements • Don’t have them too often! • You may discover a local minimum • E.g. MDPs for speech recognition • Give out best new solver prize?

  35. The Chaff story • Industrial problems, SAT & UNSAT instances • 2008, 1st MiniSAT (son of zChaff) • 2007, 1st RSAT (son of MiniSAT) • 2006, 1st MiniSAT • 2005, 1st SatELite GTI (MiniSAT+preprocessor) • 2004, 1st zChaff (Forklift from 2003 was better) • 2003, 1st Forklift • 2002, 1st zChaff

  36. Other issues • Man-power • Organizers • One is not enough? • Judges • All rules need interpretation • Compute-power • Find a friendly cluster

  37. Other issues • Multiple tracks • SAT/UNSAT • Random/industrial/crafted • … • Certificate/Uncertificated

  38. Other issues • Holding problems back if possible • Release some problems so competitors can ensure solver compliance • But hold most back so competition is blind!

  39. Other issues • Multiple phases • Too many solvers for all to compete with long timeouts • First phase to test correctness • Second phase to throw out the slow solvers (who cost you many timeouts) • Third phase to differentiate between better solvers

  40. Other issues • Reward function • <#completed, average time, …> • solution purse + speed purse • Points for each problem divided between those solvers that solve it • Getting buy in from competitors • It will (and should) evolve over time!

  41. Other issues • Prizes • Give out many! • Good for people’s CVs • Good motivator for future years

  42. Other issues • Open or closed source? • Open to share progress • Closed to get the best • Last year’s winner • Condition of entry • To see progress is being made!

  43. Other issues • Smallest unsolved problem • Give a prize! • Timing • Run during the conference • Creates a buzz so people enter next year • Get a slot in program to discuss results • Get a slot in banquet to give out prizes

  44. Conclusions • Benchmark libraries • When an area is several decades old, why wouldn’t you have one? • Competitions • Designed well, held not too frequently, & with buy-in from the community, why wouldn’t you?

  45. Questions • Disagreements • Other opinions • Different experiences • …

More Related