Loading in 2 Seconds...

Improving the Automatic Evaluation of Problem Solutions in Programming Contests

Loading in 2 Seconds...

- 79 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Improving the Automatic Evaluation of Problem Solutions in Programming Contests' - dora-craig

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Improving theAutomatic Evaluation of Problem Solutions in Programming Contests

Pedro Ribeiro and Pedro Guerreiro

Presentation Overview

- Automatic Evaluation: Past and Present
- The case of IOI
- A possible path for improving evaluation
- Developing only a function (not a complete program)
- Abstract Input/Output
- Repeat the same function call (+ clock precision)
- No hints on expected complexity
- Examine runtime behaviour as tests increase in size
- Some preliminary results
- Conclusions

Programming Contests

- All programming contests need an efficient and fair way of distinguishing submitted solutions

(Automatic) Evaluation

- What do we evaluate?
- Correction: does the program produce correct answers for all instances of the problem?
- Efficiency: does it do it fast enough? Does it have the necessary time and memory complexity?

Programming Contests

- Classic way of evaluating
- Set of pre-defined tests (inputs)
- Run program with tests and check output
- IOI has been doing this almost the same way since the beginning with two major advances:
- Manual evaluation > Automatic evaluation
- Individual Tests -> Grouped tests
- Although IOI has 3 different types of tasks, the main core of the event are still batch tasks

Programming Contests

- Correction: almost “black art”
- “Program testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence” (Dijkstra)
- Efficiency:
- Typically judges create set of model solutions of different complexities
- Tests designed in that model solutions achieve planned number of points
- Considerable amount of tuning (environment)
- Considerable amount of man power needed
- More difficult to introduce new languages

Ideas: Single function

- Solve the problem by writing a specific function (as opposed to a complete program)
- Motivation:
- Concentrate on the core algorithm (less distractors)
- Can be used on earlier stages of learning
- Opportunities for new ways of testing(more control on submitted code)
- It is already done on other types of contests:
- TopCoder
- Teaching Environments(Ribeiro and Guerreiro, 2008)

Ideas: I/O Abstraction

- The Input and Output should be “abstract” and not specific to a language
- How to do it:
- Input already in memory, passed as function arguments (simple form, no complex data structure)
- Output as the function return value(s)
- Motivation:
- Less information processing details
- Less complicated problem statements
- We can measure time spent in solution (not in I/O)
- More balanced performance between languages

Idea: Repeat function calls

- In the past we used smaller input sizes

increased speed

of computers

- Currently we use huge input sizes
- Clock resolution is poor: small instances > instant
- Need to distinguish small asymptotic complexities
- Historic fact: Smaller time limit used on IOI:
- IOI 2007, problem training: 0.3 seconds
- Future?
- Always more speed > bigger input size

Idea: Repeat function calls

- Problems completely detached from reality:
- Ex: IOI 2007 Sails, ship with 100,000 masts

Idea: Repeat function calls

- Problems completely detached from reality:
- Ex: IOI 2007 Sails, ship with 100,000 masts

Idea: Repeat function calls

- Real world: How can we measure the thickness of a sheet of paper if we have a standard ruler without enough accuracy?

stack of 100 sheets measures 1cm,

then each sheet is ~0.1mm

- We can use the same idea on functions!
- Run once with small instances may be instantaneous

But

- Running multiple times takes more than 0.00s!

Idea: Repeat function calls

- Run the same functions several times and compute average time
- Pros
- Input size can be smaller and related to problem
- We can concentrate on quality of test cases and rely less on randomization to produce big test cases that are impossible to verify manually
- Cons
- We must be careful with memory persistence between successive function calls

Idea: No hints on complexity

- When we give limits for the input:
- we simplify implementation details and avoid the need for dynamic memory allocation.

but

- We disclose the complexity required for the problem
- Trained students can identify precisely the complexity needed
- This has great impact on problem solving aspect:
- Different mindset: I know which complexity I’m looking for and I settle for a solution that does that

vs

- Scientific approach with real world open problem
- Ex: is there a polynomial solution for a problem?

Idea: No hints on complexity

- Give limits for implementation purposes, but make it clear that those are not related to sought efficiency
- More scientific and open ended approach
- Need to think how to really solve the problem (and not how to produce a program that passes the test cases)
- Not overemphasize runtime of particular language
- (let me make a test with maximum limits and see if it runs in X seconds on this machine with this language)

Idea:Runtime behaviour as tests increase

- Typically we measure efficiency by creating set of tests such that different model solutions achieve different number of points

But

- not passing does not imply that the required complexity was not achieved (other factors)
- Just means that the test case is solved within the constraints
- A lot of man power needed for model solutions and fine tuning (compiler version, computer speed, language used, etc)

Idea:Runtime behaviour as tests increase

- How can we improve on that?
- Pen and Paper not an option for large scale evaluation
- Need for automatic processes
- We have different tests, we have different time measures, why don’t we use all this information?
- Plot the runtime as data increases and do some curve fitting
- Impossible to determine complexity for all programs, but even a trivial (imperfect) curve can show more information than just knowing which test cases are passed

Some Preliminary Results

- As a proof of concept a simple problem:
- Input: Sequence of integers
- Output: Subsequence of consecutive integers with maximum sum
- Only ask for function with I/O already given
- Small input limit (only 100)
- Measure time by running multiple times (until aggregated time reached 1s)
- Use random data for 1,4,8,12,…64

Some Preliminary Results

- Implemented 3 model solutions:
- O(N^3) – Iterate all possible intervals in O(N^2) plus iterate trough each interval to discover sum in O(N)
- O(N^2) – Iterate all possible intervals in O(N^2) plus O(1) checking of each sum with accumulated sums
- O(N) – Iterate trough sequence and keep partial sum, whenever the partial sum is negative, it cannot contribute to best and therefore “reset” to zero and continue

A

B

C

Some Preliminary Results

- Plot Time(N) / Time(1)
- Simple correlation measure with another function

Some Preliminary Results

- Out of scope to give more detailed mathematical analysis
- We could use other statistical measures
- We know that it is impossible to automatically compute and prove complexities

but

- This simple approach gives meaningful results
- runtime is somehow consistent and correlated with a certain function and therefore appears to grow following a pattern that we were able to identify
- Ex: Linear > appears to take twice the time when data doubles

Some Preliminary Results

- What could this do?
- More information from the same test cases
- Possibility of giving students automatic feedback on runtime behavior
- Possibility of identifying runtime behaviors for which no model solutions were created (less man power!)
- Independent of language specific details

Ex: Archery Problem, IOI 2009, Day 1

There were solutions with O(N^2R), O(N^3), O(N^2 log N), O(N^2), O(N log N), …

No need to code them all in all languages and then tune!

Conclusion

- 20 Years of IOI: computers are much faster, style of evaluation is still the same
- Setting up test cases is time consuming and requires man power
- Need to think of ways to improve evaluation
- Our proposal, geared to more informal contests or teaching environments, can offer:
- No distraction with I/O
- No large data sets
- More natural problem statements
- No hint on complexity (open ended approach)
- No need for implementing many model solutions
- New languages can be added without changing tests
- Still more work to obtain robust system but we feel this ideas (or some of them) can be used in practice
- Future: can evaluation be improved in other ways?

The End

- And that’s all!:-)

Questions?

Pedro Ribeiro ([email protected])

Pedro Guerreiro ([email protected])

Download Presentation

Connecting to Server..