Improving the automatic evaluation of problem solutions in programming contests
Download
1 / 24

Improving the Automatic Evaluation of Problem Solutions in Programming Contests - PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on

Improving the Automatic Evaluation of Problem Solutions in Programming Contests. Pedro Ribeiro and Pedro Guerreiro. Presentation Overview. Automatic Evaluation: Past and Present The case of IOI A possible path for improving evaluation Developing only a function (not a complete program)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Improving the Automatic Evaluation of Problem Solutions in Programming Contests' - dora-craig


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Improving the automatic evaluation of problem solutions in programming contests

Improving theAutomatic Evaluation of Problem Solutions in Programming Contests

Pedro Ribeiro and Pedro Guerreiro


Presentation overview
Presentation Overview

  • Automatic Evaluation: Past and Present

    • The case of IOI

  • A possible path for improving evaluation

    • Developing only a function (not a complete program)

    • Abstract Input/Output

    • Repeat the same function call (+ clock precision)

    • No hints on expected complexity

    • Examine runtime behaviour as tests increase in size

  • Some preliminary results

  • Conclusions


Programming contests
Programming Contests

  • All programming contests need an efficient and fair way of distinguishing submitted solutions

    (Automatic) Evaluation

  • What do we evaluate?

    • Correction: does the program produce correct answers for all instances of the problem?

    • Efficiency: does it do it fast enough? Does it have the necessary time and memory complexity?


Programming contests1
Programming Contests

  • Classic way of evaluating

    • Set of pre-defined tests (inputs)

    • Run program with tests and check output

  • IOI has been doing this almost the same way since the beginning with two major advances:

    • Manual evaluation > Automatic evaluation

    • Individual Tests -> Grouped tests

  • Although IOI has 3 different types of tasks, the main core of the event are still batch tasks



Programming contests2
Programming Contests

  • Correction: almost “black art”

    • “Program testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence” (Dijkstra)

  • Efficiency:

    • Typically judges create set of model solutions of different complexities

    • Tests designed in that model solutions achieve planned number of points

    • Considerable amount of tuning (environment)

    • Considerable amount of man power needed

    • More difficult to introduce new languages


Ideas single function
Ideas: Single function

  • Solve the problem by writing a specific function (as opposed to a complete program)

  • Motivation:

    • Concentrate on the core algorithm (less distractors)

    • Can be used on earlier stages of learning

    • Opportunities for new ways of testing(more control on submitted code)

  • It is already done on other types of contests:

    • TopCoder

    • Teaching Environments(Ribeiro and Guerreiro, 2008)


Ideas i o abstraction
Ideas: I/O Abstraction

  • The Input and Output should be “abstract” and not specific to a language

  • How to do it:

    • Input already in memory, passed as function arguments (simple form, no complex data structure)

    • Output as the function return value(s)

  • Motivation:

    • Less information processing details

    • Less complicated problem statements

    • We can measure time spent in solution (not in I/O)

    • More balanced performance between languages


Idea repeat function calls
Idea: Repeat function calls

  • In the past we used smaller input sizes

    increased speed

    of computers

  • Currently we use huge input sizes

    • Clock resolution is poor: small instances > instant

    • Need to distinguish small asymptotic complexities

    • Historic fact: Smaller time limit used on IOI:

      • IOI 2007, problem training: 0.3 seconds

  • Future?

    • Always more speed > bigger input size


Idea repeat function calls1
Idea: Repeat function calls

  • Problems completely detached from reality:

    • Ex: IOI 2007 Sails, ship with 100,000 masts


Idea repeat function calls2
Idea: Repeat function calls

  • Problems completely detached from reality:

    • Ex: IOI 2007 Sails, ship with 100,000 masts


Idea repeat function calls3
Idea: Repeat function calls

  • Real world: How can we measure the thickness of a sheet of paper if we have a standard ruler without enough accuracy?

    stack of 100 sheets measures 1cm,

    then each sheet is ~0.1mm

  • We can use the same idea on functions!

    • Run once with small instances may be instantaneous

      But

    • Running multiple times takes more than 0.00s!


Idea repeat function calls4
Idea: Repeat function calls

  • Run the same functions several times and compute average time

  • Pros

    • Input size can be smaller and related to problem

    • We can concentrate on quality of test cases and rely less on randomization to produce big test cases that are impossible to verify manually

  • Cons

    • We must be careful with memory persistence between successive function calls


Idea no hints on complexity
Idea: No hints on complexity

  • When we give limits for the input:

    • we simplify implementation details and avoid the need for dynamic memory allocation.

      but

    • We disclose the complexity required for the problem

      • Trained students can identify precisely the complexity needed

  • This has great impact on problem solving aspect:

    • Different mindset: I know which complexity I’m looking for and I settle for a solution that does that

      vs

    • Scientific approach with real world open problem

    • Ex: is there a polynomial solution for a problem?


Idea no hints on complexity1
Idea: No hints on complexity

  • Give limits for implementation purposes, but make it clear that those are not related to sought efficiency

  • More scientific and open ended approach

  • Need to think how to really solve the problem (and not how to produce a program that passes the test cases)

  • Not overemphasize runtime of particular language

    • (let me make a test with maximum limits and see if it runs in X seconds on this machine with this language)


Idea runtime behaviour as tests increase
Idea:Runtime behaviour as tests increase

  • Typically we measure efficiency by creating set of tests such that different model solutions achieve different number of points

    But

  • not passing does not imply that the required complexity was not achieved (other factors)

    • Just means that the test case is solved within the constraints

  • A lot of man power needed for model solutions and fine tuning (compiler version, computer speed, language used, etc)


Idea runtime behaviour as tests increase1
Idea:Runtime behaviour as tests increase

  • How can we improve on that?

  • Pen and Paper not an option for large scale evaluation

    • Need for automatic processes

  • We have different tests, we have different time measures, why don’t we use all this information?

  • Plot the runtime as data increases and do some curve fitting

    • Impossible to determine complexity for all programs, but even a trivial (imperfect) curve can show more information than just knowing which test cases are passed


Some preliminary results
Some Preliminary Results

  • As a proof of concept a simple problem:

    • Input: Sequence of integers

    • Output: Subsequence of consecutive integers with maximum sum

  • Only ask for function with I/O already given

  • Small input limit (only 100)

  • Measure time by running multiple times (until aggregated time reached 1s)

  • Use random data for 1,4,8,12,…64


Some preliminary results1
Some Preliminary Results

  • Implemented 3 model solutions:

    • O(N^3) – Iterate all possible intervals in O(N^2) plus iterate trough each interval to discover sum in O(N)

    • O(N^2) – Iterate all possible intervals in O(N^2) plus O(1) checking of each sum with accumulated sums

    • O(N) – Iterate trough sequence and keep partial sum, whenever the partial sum is negative, it cannot contribute to best and therefore “reset” to zero and continue

A

B

C


Some preliminary results2
Some Preliminary Results

  • Plot Time(N) / Time(1)

  • Simple correlation measure with another function


Some preliminary results3
Some Preliminary Results

  • Out of scope to give more detailed mathematical analysis

    • We could use other statistical measures

  • We know that it is impossible to automatically compute and prove complexities

    but

  • This simple approach gives meaningful results

    • runtime is somehow consistent and correlated with a certain function and therefore appears to grow following a pattern that we were able to identify

      • Ex: Linear > appears to take twice the time when data doubles


Some preliminary results4
Some Preliminary Results

  • What could this do?

    • More information from the same test cases

    • Possibility of giving students automatic feedback on runtime behavior

    • Possibility of identifying runtime behaviors for which no model solutions were created (less man power!)

    • Independent of language specific details

      Ex: Archery Problem, IOI 2009, Day 1

      There were solutions with O(N^2R), O(N^3), O(N^2 log N), O(N^2), O(N log N), …

      No need to code them all in all languages and then tune!


Conclusion
Conclusion

  • 20 Years of IOI: computers are much faster, style of evaluation is still the same

  • Setting up test cases is time consuming and requires man power

  • Need to think of ways to improve evaluation

  • Our proposal, geared to more informal contests or teaching environments, can offer:

    • No distraction with I/O

    • No large data sets

    • More natural problem statements

    • No hint on complexity (open ended approach)

    • No need for implementing many model solutions

    • New languages can be added without changing tests

  • Still more work to obtain robust system but we feel this ideas (or some of them) can be used in practice

  • Future: can evaluation be improved in other ways?


The end
The End

  • And that’s all!:-)

    Questions?

    Pedro Ribeiro (pribeiro@dcc.fc.up.pt)

    Pedro Guerreiro (pjguerreiro@ualg.pt)