- 69 Views
- Uploaded on
- Presentation posted in: General

Every Bit Counts

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Every Bit Counts

A semantic approach to the binary

representation of data and programs

Andrew Kennedy & Dimitrios Vytiniotis

Microsoft Research Cambridge

{akenn,dimitris}@microsoft.com

ICFP 2010, Baltimore, MD

Before the fun starts: “is encoding and decoding relevant?”

Sure:

- How to design easy-to-verify tamper-proof bytecode formats?
- Semi-formal work for Java [Franz et al.]

- How to incorporate semantic and statistical info for more compact encodings and compression?
- Java bytecode and .NET CLI are quite bulky formats, also work on Javascript compression schemes etc.
- Lots of work in the XML world, oracle-based PCC checking [Necula and Rahul], term compression [J. Cheney], …

- How to make it easy to prove the correctness of a codec?
- Lots of work in the generic programming realm, also PADS [Fisher et al.]…

- … and offer all that in a nice DSL?
- Easier in use and verification than picklercombinators [Kennedy]

Let’s play a “guess who” game

I have some PL researcher in mind. Can you guess who?

Yes

Do they do research in functional programming?

Do they care a lot about minimal invariance?

No

Do they work on polytypic programming?

Yes

Are they taller than 1.90m?

Yes

Are they in the ICFP committee?

No

Guess which program

I have some program* in mind. Can you guess which?

Code 0100110

Aha! You thought of λx:Int.λy:Int.x

* A closed program in STLC with Int base type.

The idea

Represent a codec by a strategy for playing a question & answer guessing-game

- Encode
- ask questions of data and record answers as bitstream

- Decode
- interpret bitstream as answers to the same Q&A strategy

Example play, set-theoretically

Is it a function application? No.

Is its argument an Int? Yes.

Is its body a variable? No.

Non-Int-argument lambdas

All well-typed programs

…

Lambda expressions

Int-argument lambdas with variable body

Int-argument lambdas

Singleton set

Int-argument lambdas with non-variable body

Function application expressions

Set of possible data values

Binary partition of set

From sets to types

- Possible set of data values: type
- Binary partition of set: type isomorphism
- Singleton set: type isomorphism
- Strategy: possibly-infinite binary decision tree whose
- nodes contain type isomorphisms
- leaves contain type isomorphisms

Or, in code:

A silly game: unary naturals

isZero:

0

1

isZero:

0

1

…

Infinite tree, crucially relying on laziness (co-induction in Coq)

Generic encoding and decoding

Encode a value of type to a bitstream

If is a singleton, there’s no information to encode!

Otherwise, use the map from to to “ask” in which partition lives

Emit a bit and continue on the left or right subtree with the deconstructed value

May throw error if bistream too short

Example:

Non-ambiguity and non-redundancy for free*

Set

Non-redundant codes

Unambiguous codes

01001010

01001110

* If the ISOs are indeed isomorphisms

Bitstrings

Game combinators

Cast a game from one type to another through an isomorphism

Given games for and , construct games for sum or product of and

Dependent pairs: type of second component depends on value of first

Combinators in action!

Combinators= co-fixpoints

No silly questions please, and Every Bit Counts!

- If possible, strategy should not ask “silly questions” that reveal no new information e.g.
Are you a number smaller than 5?YesAre you a number smaller than 7? Of course I am!

- This corresponds to proper partitioning: For all isosin game sets and are non-empty

Theorem: Suppose has proper partitioning, and there is a leaf for every element of its domain. If fails then there is some extension of such that succeeds.

But what does that mean?

Theorem: … blablablah …

That feels highly compact! Can we take this domain to be the “set of well typed programs”?

EVERY bitstring represents a non-empty set of elements in the domain

Simple types

Problem: Devise a game for STLC with no “silly questions”!

- Idea 1:Parameterize game on environment (for open terms) and type:

Not every environment/type combination is inhabited. To avoid asking “silly” questions (at game construction time – not at encoding/decoding time) we have to solve inhabitation problems.

Some ingenuity required

Idea 2:

Parameterize on environment and pattern of form where is a wildcard

All environment/pattern combinations are inhabited, no need to solve hard problems at game construction time

A provably EVERY BIT COUNTS encoding for STLC

(and the proof did not kill us)

The STLC game

Can we play a game for variables with this pattern in this environment?

Are you a variable?

Are you an

application?

Application game:

Play game for argument

*Get* the argument and play game for the function using the argument’s type

Haskell code for STLC, on one slide

See paper for details, games for several statistical compression schemes, and even more game transformations

Future directions

Do it for real! E.g. .NET CIL, ghc Core

Integrate arithmeticcoding. Put probabilities on arcs of tree.

Develop “methodology” for codecs for typed programming languages. (=> No ingenuity required?)

What’s left, after the fun?

An elegant characterization

codecs Q&A strategies

and a DSL to program them

- Q&A strategies can give rise to non-redundant, compactcoding schemes
- Offer cheap verification
- And are fun to program with
Download and play:

http://research.microsoft.com/people/dimitris