New approaches to mobile code reconciling execution efficiency with provable security
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security. UC Irvine – project trans pro se: transporting programs securely. Michael Franz University of California, Irvine July 2001. Introduction. mobile code is an enabling technology

Download Presentation

New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

New approaches to mobile code reconciling execution efficiency with provable security

New Approaches to Mobile Code:Reconciling Execution Efficiencywith Provable Security

UC Irvine – project transprose: transporting programs securely

Michael Franz

University of California, Irvine

July 2001



  • mobile code is an enabling technology

    • download functionality as needed

    • handheld, untethered devices, “information appliances”

    • platform-independent fi identical code can run on PDAs, desktop machines, even supercomputers

  • but, many unresolved issues with respect to

    • performance of the mobile program (on the target)

    • performance of the mobile code distribution mechanism

    • protecting the host against malicious mobile programs

    • [guarding a mobile program’s secrets against a malicious host]

Guiding overall objective

Guiding Overall Objective

  • make mobile code practical, so that

  • eventually, native code will need to exist only transiently, created on-the-fly and consumed on the spot

  • while mobile code will be used as the storage and distribution medium



  • dynamic code-generation technology is approaching maturity and processors are becoming fast enough to sustain it (in real time)

  • this is rapidly diminishing the value of “binary compatibility”

  • moreover, dynamic optimization techniques yield better code than static compilation

    • exploit actual processor parameters (caches, …)

    • “live” profiling data may be available

  • => “mobile code will define future platform(s)”

Mobile code security

Mobile Code Security

  • most approaches are based on some type-safe programming language

  • host systems publish their policies in terms of type-safe APIs

  • conformance to that interface is then guaranteed by the mobile code transportation scheme

    • semantically equivalent to transporting source code

    • however, for efficiency and quality of dynamic code generation, usually want to transport a format “closer to the machine” while still preserving source-program type-safety semantics

Existing practice java

Existing Practice: Java

  • the Java Virtual Machine is the de-facto standard format for distributing mobile programs

  • the JVM has an instruction set that has been designed specifically for representing Java programs

    • interestingly enough, there still are JVM programs for which no legal equivalent Java source program exists

    • there are also legal Java programs that are rejected by all possible JVM bytecode verifiers [Staerk’00]

  • security is obtained by verifying the JVM bytecode, essentially a symbolic execution of the program

Security vs efficiency

Security vs. Efficiency

  • the Java Virtual Machine's instruction format is not very capable in transporting the results of program analyses and optimizations

  • as a consequence, when Java byte-code is transmitted, each recipient must repeat most of the analyses and optimizations that could have been performed just once at the origin

  • the main reason why Java byte-code has these deficiencies is to allow verification by the recipient

Security vs efficiency1

Security vs. Efficiency

  • for example, a code producer often has information about the redundancy of a type or index check

  • but this fact cannot be communicated safely to the code consumer - not in a manner that the recipient can be sure that this is not a false claim inserted by a malicious third party

  • similar concerns inhibit common compiler optimizations such as common subexpression elimination at the code producer’s side

An alternative approach pcc

An Alternative Approach: PCC

  • instead of executing the program symbolically at the receiver’s site (which is time consuming and complex), the code producer attaches a “proof” that the code is correct

  • the “proof” shortcuts the verification: checking a given solution is often much simpler than finding it in the first place

  • the Java KVM for embedded devices uses a kind of PCC (“stack maps”) that may become a standard for Java

A third approach

A Third Approach

  • instead of verifying or checking, we have been been investigating a class of mobile code representations that can provably encode only “legal” programs

    • security is obtained by construction

    • the need for verification disappears

    • our approach can provide the identical security guarantees as the Java Virtual Machine, but it can express most of them statically as a well-formedness property of the encoding itself

    • in our solution, an incoming mobile program may not do the intended task, but it will not do anything “bad” - for any definition of “bad” that can be cast into a type system

    • interestingly enough, such “intrinsically secure” mobile code is also denser than virtual machine code, and permits to generate better object code, and faster

A third approach two variants

A Third Approach: Two Variants

  • we have in fact designed not just one, but two alternative mobile-code representations, both of which provide “security by construction”

  • they differ in the semantic level at which they describe the mobile program

    • “high-level”: close to the source language but with supporting compiler-related information

    • “low-level”: as close to what a modern code generator back-end needs without being target-machine specific

Rationale for multi track approach

Rationale for Multi-Track Approach

  • the relative trade-offs (encoding density vs. decoding/dynamiccompilation speed vs. code quality) are completely unknown and can only be determined by collecting experience with actual prototypes

  • by implementing both the “high-level” and the “low-level” solution, we are exploring the design space rather than designing an ad-hoc solution

Low level encoding pldi01

Low-Level Encoding [PLDI01]

  • SafeTSA preserves control and dataflow information as well as full typing for each intermediate result

  • it is based on SSA form, a representation that is also used internally by a number of important state-of-the-art research compilers for Java, e.g.,

    • IBM T.J. Watson Lab: Jalapeño

    • Microsoft: Marmot

    • Sun Microsystems: HotSpot Server

  • SafeTSA is far easier to parse into a form useful for code optimization than JVM-code

Current status and results

Current Status and Results

  • based on Martin Odersky’s Pizza front-end

  • can compile all of Java to safeTSA

  • prototype run-time environment almost finished; will provide full interoperability between safeTSA and JVM-based class files

    • can mix and match both formats with dynamic loading

    • call-backs from JVM to safeTSA are ugly

  • safeTSA representation is surprisingly small

High level encoding babel01

High-Level Encoding [Babel01]

  • ultra-compact representation using grammar-based compression of abstract syntax trees

  • goal is to transport the source program along with as much compiler-related support information as possible

Schematic overview

Schematic Overview













& Arithmetic



& Arithmetic



Compression / Decompression

Compression overview

Compression Overview

  • Parsing: get AST from source

  • Serialize: get stream of symbols from AST

  • Modeling: use context and abstract grammar to build predictive statistical model

  • Coding: use arithmetic coding with model

Types of nodes in ast

Types of nodes in AST

  • String, Integer, Terminal

  • List : e.g. Block = BlockStatement*

  • Aggregate : e.g. IF = cond thenbranch elsebranch

  • Choice : e.g. BinOp = Plus | Minus | …

  • Information is in choice nodes

    • want to guess which choice is taken

Transmitting an ast

Transmitting an AST

  • any predefined serialization will do

  • we use depth first (pre-order)

  • when serialized, most info in AST is redundant, e.g.

    • order and kind of kids of aggregate nodes known

    • this is because we use knowledge of the grammar

  • must encode index of choice made at choice nodes

Prediction by partial match ppm

Prediction by Partial Match (PPM)

  • dynamically maintain counts of characters seen after various contexts

  • contexts may be of various lengths

    • eg. for “abcd”, contexts for “d” are :

      • length 1 context : “c”

      • length 2 context : “bc”

      • length 3 context : “abc”

  • predict characters in current context by looking at what occurred previously

Maintaining contexts











Maintaining Contexts






Adapting ppm to work on trees

Adapting PPM To Work On Trees

  • each node is a symbol

  • the context is path from root to the current node in the AST

  • problem: in DFS, what when we reach leaf node and go back up to ancestor?

    • pop context – all active nodes moved up one position to their parents (in context tree)



  • PPM is used to model the choices made at choice nodes, i.e. associate a probability with every choice

  • these probabilities are used to drive an arithmetic coder to output bits

Compressing constants

Compressing Constants

  • constants (strings, integers, names) are a significant fraction of source

  • to compress: make table of constants, and refer to them by their index in this table

  • further compress: maintain different tables for strings, names etc. – reduces number of bits in index

  • currently exploring more sophisticated context modeling ideas for compressing constants

Ast compression example

AST Compression: Example

AST for: i = i + 1

Relevant grammar rules

Stmt = If|While|Assign|….

Assign = Lvalue Expr

Lvalue = Field|VarAccess|…

Expr = Unary|Binary|…

Binary = BinOp Expr Expr

Choice nodes

Preorder traversal

Stmt AssignLvalueVarAccess iExpr

Binary ExprVarAccess iExprLiteral

IntLiteral 1BinOp +

Ast compression example1

AST Compression: Example

Context tree

AST for: i = i + 1

Ast compression example2

AST Compression: Example

AST for: i = i + 1

Context tree

Ast compression example3

AST Compression: Example

AST for: i = i + 1

Context tree


Prob(j) = 0.3

Prob(k) = 0.5

Prob(i) = 0.2

Send model and

choice “i” to

arithmetic coder

Status and results

Status and Results

  • compressor/decompressor prototype written in Python

  • completely generic – can be used with any abstract grammar

  • have implemented the Java abstract grammar

    • works with single Java source files as well as entire packages.

  • comparison for Java class-file compression with Pugh’s results (best published Java compressor)

Results classes

Results: Classes

Classes from Sun’s javac package - all sizes in bytes

Results archives

Results: Archives

compressed collections of classes - all sizes in bytes

  • compressed ASTs are 5-50% smaller than Pugh’s

  • 3-8 times smaller than uncompressed class files or JAR files

Performance enhancing information

Performance-Enhancing Information

  • now raise the semantic level of the grammar

  • e.g. “Escape Analysis”

    • an object that doesn’t “escape” its defining scope can be allocated on the stack rather than on the heap

    • this optimization alone can often double performance

  • the analysis itself is very difficult to do, but the results of the analysis are easy to verify

    • augment the type system by “escaping/non-escaping”

    • make this part of the encoding scheme itself

    • e.g., => a non-escaping object cannot be assigned to a variable from an enclosing scope

Insights so far

Insights So Far

  • abstract syntax trees viable as a mobile code format

  • can be highly compressed

    • Java archives by factor of 3-8

    • 5-50% better than Java bytecode specific compression by Pugh

Overall project achievements

Overall Project Achievements

  • lead the way to a genuine improvement over virtual machine transportation formats

    • security without need for validation

    • tamper-proof performance-improving information

  • innovative and generic program compression method as a useful by-product of this effort

Task schedule

Task Schedule

  • Y1 Milestones:

  • source-level representation => Java compression

  • low-level representation

  • core calculus representation

  • Y2 Milestones:

  • system prototypes

  • trade-off analysis

  • encoding format comprehensive definition

  • End of Project:

  • system deliverable

  • comprehensive documentation





  • investigate:

  • multiple source languages

  • graph-based encoding schemes

  • proof-carrying code

  • investigate:

  • requirements ofoptimizing code generators

  • integration of security vs. compiler-related data

  • investigate:

  • mutual interaction of security, efficiency, and compression density

  • security of system

Mobile code security revisited

Mobile Code Security Revisited

  • provided through type-safe programming language and type-safe APIs

    • semantically equivalent to transporting source code (everybody does it this way)

  • but many policies currently cannot be expressed in terms of a type system and hence need to be implemented inside the library

    • “open only files in directory X”

    • “initiate connections only with IP addresses in range […]”

    • “execute no more than N instructions between OS calls”

    • “do not send on network after reading local files”

    • => security automata

  • need to represent these properties directly and support them along the whole pipeline from code producer to code consumer

  • => some other PIs in Oasis are working on these themes and their work can be directly beneficial to this project

Transition of technology

Transition of Technology

  • our prototype implementation(s) will be made available in source form

  • the idea is to create a “turnkey” replacement to current Java compilers and JVM runtime systems

    • you simply take your code and recompile using our compiler

    • it will then run on our runtime

    • our runtime will also run your old JVM class files

    • you can even mix our stuff with JVM class files

  • => we simply provide a new (better!) mobile code transportation layer without changing anything else

Thank you

Thank You

  • Login