improving the automatic evaluation of problem solutions in programming contests
Download
Skip this Video
Download Presentation
Improving the Automatic Evaluation of Problem Solutions in Programming Contests

Loading in 2 Seconds...

play fullscreen
1 / 24

Improving the Automatic Evaluation of Problem Solutions in Programming Contests - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

Improving the Automatic Evaluation of Problem Solutions in Programming Contests. Pedro Ribeiro and Pedro Guerreiro. Presentation Overview. Automatic Evaluation: Past and Present The case of IOI A possible path for improving evaluation Developing only a function (not a complete program)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Improving the Automatic Evaluation of Problem Solutions in Programming Contests' - dora-craig


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
improving the automatic evaluation of problem solutions in programming contests

Improving theAutomatic Evaluation of Problem Solutions in Programming Contests

Pedro Ribeiro and Pedro Guerreiro

presentation overview
Presentation Overview
  • Automatic Evaluation: Past and Present
    • The case of IOI
  • A possible path for improving evaluation
    • Developing only a function (not a complete program)
    • Abstract Input/Output
    • Repeat the same function call (+ clock precision)
    • No hints on expected complexity
    • Examine runtime behaviour as tests increase in size
  • Some preliminary results
  • Conclusions
programming contests
Programming Contests
  • All programming contests need an efficient and fair way of distinguishing submitted solutions

(Automatic) Evaluation

  • What do we evaluate?
    • Correction: does the program produce correct answers for all instances of the problem?
    • Efficiency: does it do it fast enough? Does it have the necessary time and memory complexity?
programming contests1
Programming Contests
  • Classic way of evaluating
    • Set of pre-defined tests (inputs)
    • Run program with tests and check output
  • IOI has been doing this almost the same way since the beginning with two major advances:
    • Manual evaluation > Automatic evaluation
    • Individual Tests -> Grouped tests
  • Although IOI has 3 different types of tasks, the main core of the event are still batch tasks
programming contests2
Programming Contests
  • Correction: almost “black art”
    • “Program testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence” (Dijkstra)
  • Efficiency:
    • Typically judges create set of model solutions of different complexities
    • Tests designed in that model solutions achieve planned number of points
    • Considerable amount of tuning (environment)
    • Considerable amount of man power needed
    • More difficult to introduce new languages
ideas single function
Ideas: Single function
  • Solve the problem by writing a specific function (as opposed to a complete program)
  • Motivation:
    • Concentrate on the core algorithm (less distractors)
    • Can be used on earlier stages of learning
    • Opportunities for new ways of testing(more control on submitted code)
  • It is already done on other types of contests:
    • TopCoder
    • Teaching Environments(Ribeiro and Guerreiro, 2008)
ideas i o abstraction
Ideas: I/O Abstraction
  • The Input and Output should be “abstract” and not specific to a language
  • How to do it:
    • Input already in memory, passed as function arguments (simple form, no complex data structure)
    • Output as the function return value(s)
  • Motivation:
    • Less information processing details
    • Less complicated problem statements
    • We can measure time spent in solution (not in I/O)
    • More balanced performance between languages
idea repeat function calls
Idea: Repeat function calls
  • In the past we used smaller input sizes

increased speed

of computers

  • Currently we use huge input sizes
    • Clock resolution is poor: small instances > instant
    • Need to distinguish small asymptotic complexities
    • Historic fact: Smaller time limit used on IOI:
      • IOI 2007, problem training: 0.3 seconds
  • Future?
    • Always more speed > bigger input size
idea repeat function calls1
Idea: Repeat function calls
  • Problems completely detached from reality:
    • Ex: IOI 2007 Sails, ship with 100,000 masts
idea repeat function calls2
Idea: Repeat function calls
  • Problems completely detached from reality:
    • Ex: IOI 2007 Sails, ship with 100,000 masts
idea repeat function calls3
Idea: Repeat function calls
  • Real world: How can we measure the thickness of a sheet of paper if we have a standard ruler without enough accuracy?

stack of 100 sheets measures 1cm,

then each sheet is ~0.1mm

  • We can use the same idea on functions!
    • Run once with small instances may be instantaneous

But

    • Running multiple times takes more than 0.00s!
idea repeat function calls4
Idea: Repeat function calls
  • Run the same functions several times and compute average time
  • Pros
    • Input size can be smaller and related to problem
    • We can concentrate on quality of test cases and rely less on randomization to produce big test cases that are impossible to verify manually
  • Cons
    • We must be careful with memory persistence between successive function calls
idea no hints on complexity
Idea: No hints on complexity
  • When we give limits for the input:
    • we simplify implementation details and avoid the need for dynamic memory allocation.

but

    • We disclose the complexity required for the problem
      • Trained students can identify precisely the complexity needed
  • This has great impact on problem solving aspect:
    • Different mindset: I know which complexity I’m looking for and I settle for a solution that does that

vs

    • Scientific approach with real world open problem
    • Ex: is there a polynomial solution for a problem?
idea no hints on complexity1
Idea: No hints on complexity
  • Give limits for implementation purposes, but make it clear that those are not related to sought efficiency
  • More scientific and open ended approach
  • Need to think how to really solve the problem (and not how to produce a program that passes the test cases)
  • Not overemphasize runtime of particular language
    • (let me make a test with maximum limits and see if it runs in X seconds on this machine with this language)
idea runtime behaviour as tests increase
Idea:Runtime behaviour as tests increase
  • Typically we measure efficiency by creating set of tests such that different model solutions achieve different number of points

But

  • not passing does not imply that the required complexity was not achieved (other factors)
    • Just means that the test case is solved within the constraints
  • A lot of man power needed for model solutions and fine tuning (compiler version, computer speed, language used, etc)
idea runtime behaviour as tests increase1
Idea:Runtime behaviour as tests increase
  • How can we improve on that?
  • Pen and Paper not an option for large scale evaluation
    • Need for automatic processes
  • We have different tests, we have different time measures, why don’t we use all this information?
  • Plot the runtime as data increases and do some curve fitting
    • Impossible to determine complexity for all programs, but even a trivial (imperfect) curve can show more information than just knowing which test cases are passed
some preliminary results
Some Preliminary Results
  • As a proof of concept a simple problem:
    • Input: Sequence of integers
    • Output: Subsequence of consecutive integers with maximum sum
  • Only ask for function with I/O already given
  • Small input limit (only 100)
  • Measure time by running multiple times (until aggregated time reached 1s)
  • Use random data for 1,4,8,12,…64
some preliminary results1
Some Preliminary Results
  • Implemented 3 model solutions:
    • O(N^3) – Iterate all possible intervals in O(N^2) plus iterate trough each interval to discover sum in O(N)
    • O(N^2) – Iterate all possible intervals in O(N^2) plus O(1) checking of each sum with accumulated sums
    • O(N) – Iterate trough sequence and keep partial sum, whenever the partial sum is negative, it cannot contribute to best and therefore “reset” to zero and continue

A

B

C

some preliminary results2
Some Preliminary Results
  • Plot Time(N) / Time(1)
  • Simple correlation measure with another function
some preliminary results3
Some Preliminary Results
  • Out of scope to give more detailed mathematical analysis
    • We could use other statistical measures
  • We know that it is impossible to automatically compute and prove complexities

but

  • This simple approach gives meaningful results
    • runtime is somehow consistent and correlated with a certain function and therefore appears to grow following a pattern that we were able to identify
      • Ex: Linear > appears to take twice the time when data doubles
some preliminary results4
Some Preliminary Results
  • What could this do?
    • More information from the same test cases
    • Possibility of giving students automatic feedback on runtime behavior
    • Possibility of identifying runtime behaviors for which no model solutions were created (less man power!)
    • Independent of language specific details

Ex: Archery Problem, IOI 2009, Day 1

There were solutions with O(N^2R), O(N^3), O(N^2 log N), O(N^2), O(N log N), …

No need to code them all in all languages and then tune!

conclusion
Conclusion
  • 20 Years of IOI: computers are much faster, style of evaluation is still the same
  • Setting up test cases is time consuming and requires man power
  • Need to think of ways to improve evaluation
  • Our proposal, geared to more informal contests or teaching environments, can offer:
    • No distraction with I/O
    • No large data sets
    • More natural problem statements
    • No hint on complexity (open ended approach)
    • No need for implementing many model solutions
    • New languages can be added without changing tests
  • Still more work to obtain robust system but we feel this ideas (or some of them) can be used in practice
  • Future: can evaluation be improved in other ways?
the end
The End
  • And that’s all!:-)

Questions?

Pedro Ribeiro ([email protected])

Pedro Guerreiro ([email protected])

ad