Every bit counts
1 / 21

Every Bit Counts - PowerPoint PPT Presentation

  • Uploaded on

Every Bit Counts. A semantic approach to the binary representation of data and programs. Andrew Kennedy & Dimitrios Vytiniotis Microsoft Research Cambridge { akenn,dimitris [email protected] ICFP 2010 , Baltimore, MD. Before the fun starts: “is encoding and decoding relevant?”. Sure:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Every Bit Counts' - ion

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Every bit counts

Every Bit Counts

A semantic approach to the binary

representation of data and programs

Andrew Kennedy & Dimitrios Vytiniotis

Microsoft Research Cambridge

{akenn,[email protected]

ICFP 2010, Baltimore, MD

Before the fun starts is encoding and decoding relevant
Before the fun starts: “is encoding and decoding relevant?”


  • How to design easy-to-verify tamper-proof bytecode formats?

    • Semi-formal work for Java [Franz et al.]

  • How to incorporate semantic and statistical info for more compact encodings and compression?

    • Java bytecode and .NET CLI are quite bulky formats, also work on Javascript compression schemes etc.

    • Lots of work in the XML world, oracle-based PCC checking [Necula and Rahul], term compression [J. Cheney], …

  • How to make it easy to prove the correctness of a codec?

    • Lots of work in the generic programming realm, also PADS [Fisher et al.]…

  • … and offer all that in a nice DSL?

    • Easier in use and verification than picklercombinators [Kennedy]

L et s play a guess who game
Let’s play a “guess who” game

I have some PL researcher in mind. Can you guess who?


Do they do research in functional programming?

Do they care a lot about minimal invariance?


Do they work on polytypic programming?


Are they taller than 1.90m?


Are they in the ICFP committee?


Guess which program
Guess which program

I have some program* in mind. Can you guess which?

Code 0100110

Aha! You thought of λx:Int.λy:Int.x

* A closed program in STLC with Int base type.

The idea
The idea

Represent a codec by a strategy for playing a question & answer guessing-game

  • Encode

    • ask questions of data and record answers as bitstream

  • Decode

    • interpret bitstream as answers to the same Q&A strategy

Example play set theoretically
Example play, set-theoretically

Is it a function application? No.

Is its argument an Int? Yes.

Is its body a variable? No.

Non-Int-argument lambdas

All well-typed programs

Lambda expressions

Int-argument lambdas with variable body

Int-argument lambdas

Singleton set

Int-argument lambdas with non-variable body

Function application expressions

Set of possible data values

Binary partition of set

From sets to types
From sets to types

  • Possible set of data values: type

  • Binary partition of set: type isomorphism

  • Singleton set: type isomorphism

  • Strategy: possibly-infinite binary decision tree whose

    • nodes contain type isomorphisms

    • leaves contain type isomorphisms

Or, in code:

A silly game unary naturals
A silly game: unary naturals







Infinite tree, crucially relying on laziness (co-induction in Coq)

Generic encoding and decoding
Generic encoding and decoding

Encode a value of type to a bitstream

If is a singleton, there’s no information to encode!

Otherwise, use the map from to to “ask” in which partition lives

Emit a bit and continue on the left or right subtree with the deconstructed value

May throw error if bistream too short


Correctness for free
Correctness for free*



* If the ISOs are indeed isomorphisms


Non ambiguity and non redundancy for free
Non-ambiguity and non-redundancy for free*


Non-redundant codes

Unambiguous codes



* If the ISOs are indeed isomorphisms


Game combinators
Game combinators

Cast a game from one type to another through an isomorphism

Given games for and , construct games for sum or product of and

Dependent pairs: type of second component depends on value of first

Combinators in action!

Combinators co fixpoints
Combinators= co-fixpoints

No silly questions please and e very b it c ounts
No silly questions please, and Every Bit Counts!

  • If possible, strategy should not ask “silly questions” that reveal no new information e.g.

    Are you a number smaller than 5?YesAre you a number smaller than 7? Of course I am!

  • This corresponds to proper partitioning: For all isosin game sets and are non-empty

Theorem: Suppose has proper partitioning, and there is a leaf for every element of its domain. If fails then there is some extension of such that succeeds.

But what does that mean
But what does that mean?

Theorem: … blablablah …

That feels highly compact! Can we take this domain to be the “set of well typed programs”?

EVERY bitstring represents a non-empty set of elements in the domain

Simple types
Simple types

Problem: Devise a game for STLC with no “silly questions”!

  • Idea 1:Parameterize game on environment (for open terms) and type:

 Not every environment/type combination is inhabited. To avoid asking “silly” questions (at game construction time – not at encoding/decoding time) we have to solve inhabitation problems.

Some ingenuity required
Some ingenuity required

Idea 2:

Parameterize on environment and pattern of form where is a wildcard

All environment/pattern combinations are inhabited, no need to solve hard problems at game construction time

A provably EVERY BIT COUNTS encoding for STLC

(and the proof did not kill us)

The stlc game
The STLC game

Can we play a game for variables with this pattern in this environment?

Are you a variable?

Are you an


Application game:

Play game for argument

*Get* the argument and play game for the function using the argument’s type

Pearly too!

Haskell code for STLC, on one slide

See paper for details, games for several statistical compression schemes, and even more game transformations

Future directions
Future directions

Do it for real! E.g. .NET CIL, ghc Core

Integrate arithmeticcoding. Put probabilities on arcs of tree.

Develop “methodology” for codecs for typed programming languages. (=> No ingenuity required?)

What s left after the fun
What’s left, after the fun?

An elegant characterization

codecs Q&A strategies

and a DSL to program them

  • Q&A strategies can give rise to non-redundant, compactcoding schemes

  • Offer cheap verification

  • And are fun to program with

    Download and play: