new approaches to mobile code reconciling execution efficiency with provable security
Download
Skip this Video
Download Presentation
New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security

Loading in 2 Seconds...

play fullscreen
1 / 38

New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security. UC Irvine – project trans pro se: transporting programs securely. Michael Franz University of California, Irvine July 2001. Introduction. mobile code is an enabling technology

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security' - keegan-richardson


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
new approaches to mobile code reconciling execution efficiency with provable security

New Approaches to Mobile Code:Reconciling Execution Efficiencywith Provable Security

UC Irvine – project transprose: transporting programs securely

Michael Franz

University of California, Irvine

July 2001

introduction
Introduction
  • mobile code is an enabling technology
    • download functionality as needed
    • handheld, untethered devices, “information appliances”
    • platform-independent fi identical code can run on PDAs, desktop machines, even supercomputers
  • but, many unresolved issues with respect to
    • performance of the mobile program (on the target)
    • performance of the mobile code distribution mechanism
    • protecting the host against malicious mobile programs
    • [guarding a mobile program’s secrets against a malicious host]
guiding overall objective
Guiding Overall Objective
  • make mobile code practical, so that
  • eventually, native code will need to exist only transiently, created on-the-fly and consumed on the spot
  • while mobile code will be used as the storage and distribution medium
context
Context
  • dynamic code-generation technology is approaching maturity and processors are becoming fast enough to sustain it (in real time)
  • this is rapidly diminishing the value of “binary compatibility”
  • moreover, dynamic optimization techniques yield better code than static compilation
    • exploit actual processor parameters (caches, …)
    • “live” profiling data may be available
  • => “mobile code will define future platform(s)”
mobile code security
Mobile Code Security
  • most approaches are based on some type-safe programming language
  • host systems publish their policies in terms of type-safe APIs
  • conformance to that interface is then guaranteed by the mobile code transportation scheme
    • semantically equivalent to transporting source code
    • however, for efficiency and quality of dynamic code generation, usually want to transport a format “closer to the machine” while still preserving source-program type-safety semantics
existing practice java
Existing Practice: Java
  • the Java Virtual Machine is the de-facto standard format for distributing mobile programs
  • the JVM has an instruction set that has been designed specifically for representing Java programs
    • interestingly enough, there still are JVM programs for which no legal equivalent Java source program exists
    • there are also legal Java programs that are rejected by all possible JVM bytecode verifiers [Staerk’00]
  • security is obtained by verifying the JVM bytecode, essentially a symbolic execution of the program
security vs efficiency
Security vs. Efficiency
  • the Java Virtual Machine\'s instruction format is not very capable in transporting the results of program analyses and optimizations
  • as a consequence, when Java byte-code is transmitted, each recipient must repeat most of the analyses and optimizations that could have been performed just once at the origin
  • the main reason why Java byte-code has these deficiencies is to allow verification by the recipient
security vs efficiency1
Security vs. Efficiency
  • for example, a code producer often has information about the redundancy of a type or index check
  • but this fact cannot be communicated safely to the code consumer - not in a manner that the recipient can be sure that this is not a false claim inserted by a malicious third party
  • similar concerns inhibit common compiler optimizations such as common subexpression elimination at the code producer’s side
an alternative approach pcc
An Alternative Approach: PCC
  • instead of executing the program symbolically at the receiver’s site (which is time consuming and complex), the code producer attaches a “proof” that the code is correct
  • the “proof” shortcuts the verification: checking a given solution is often much simpler than finding it in the first place
  • the Java KVM for embedded devices uses a kind of PCC (“stack maps”) that may become a standard for Java
a third approach
A Third Approach
  • instead of verifying or checking, we have been been investigating a class of mobile code representations that can provably encode only “legal” programs
    • security is obtained by construction
    • the need for verification disappears
    • our approach can provide the identical security guarantees as the Java Virtual Machine, but it can express most of them statically as a well-formedness property of the encoding itself
    • in our solution, an incoming mobile program may not do the intended task, but it will not do anything “bad” - for any definition of “bad” that can be cast into a type system
    • interestingly enough, such “intrinsically secure” mobile code is also denser than virtual machine code, and permits to generate better object code, and faster
a third approach two variants
A Third Approach: Two Variants
  • we have in fact designed not just one, but two alternative mobile-code representations, both of which provide “security by construction”
  • they differ in the semantic level at which they describe the mobile program
    • “high-level”: close to the source language but with supporting compiler-related information
    • “low-level”: as close to what a modern code generator back-end needs without being target-machine specific
rationale for multi track approach
Rationale for Multi-Track Approach
  • the relative trade-offs (encoding density vs. decoding/dynamiccompilation speed vs. code quality) are completely unknown and can only be determined by collecting experience with actual prototypes
  • by implementing both the “high-level” and the “low-level” solution, we are exploring the design space rather than designing an ad-hoc solution
low level encoding pldi01
Low-Level Encoding [PLDI01]
  • SafeTSA preserves control and dataflow information as well as full typing for each intermediate result
  • it is based on SSA form, a representation that is also used internally by a number of important state-of-the-art research compilers for Java, e.g.,
    • IBM T.J. Watson Lab: Jalapeño
    • Microsoft: Marmot
    • Sun Microsystems: HotSpot Server
  • SafeTSA is far easier to parse into a form useful for code optimization than JVM-code
current status and results
Current Status and Results
  • based on Martin Odersky’s Pizza front-end
  • can compile all of Java to safeTSA
  • prototype run-time environment almost finished; will provide full interoperability between safeTSA and JVM-based class files
    • can mix and match both formats with dynamic loading
    • call-backs from JVM to safeTSA are ugly
  • safeTSA representation is surprisingly small
high level encoding babel01
High-Level Encoding [Babel01]
  • ultra-compact representation using grammar-based compression of abstract syntax trees
  • goal is to transport the source program along with as much compiler-related support information as possible
schematic overview
Schematic Overview

Source

Parser

CodeGenerator

“classic

Frontend”

AST

Encoder

“classic

Backend”

AST

Decoder

PPM-Model

& Arithmetic

Encoder

PPM-Model

& Arithmetic

Decoder

011000101010…

Compression / Decompression

compression overview
Compression Overview
  • Parsing: get AST from source
  • Serialize: get stream of symbols from AST
  • Modeling: use context and abstract grammar to build predictive statistical model
  • Coding: use arithmetic coding with model
types of nodes in ast
Types of nodes in AST
  • String, Integer, Terminal
  • List : e.g. Block = BlockStatement*
  • Aggregate : e.g. IF = cond thenbranch elsebranch
  • Choice : e.g. BinOp = Plus | Minus | …
  • Information is in choice nodes
    • want to guess which choice is taken
transmitting an ast
Transmitting an AST
  • any predefined serialization will do
  • we use depth first (pre-order)
  • when serialized, most info in AST is redundant, e.g.
    • order and kind of kids of aggregate nodes known
    • this is because we use knowledge of the grammar
  • must encode index of choice made at choice nodes
prediction by partial match ppm
Prediction by Partial Match (PPM)
  • dynamically maintain counts of characters seen after various contexts
  • contexts may be of various lengths
    • eg. for “abcd”, contexts for “d” are :
      • length 1 context : “c”
      • length 2 context : “bc”
      • length 3 context : “abc”
  • predict characters in current context by looking at what occurred previously
maintaining contexts

a

b

c

d

b

c

d

d

c

d

Maintaining Contexts

*

a

b

c

d

adapting ppm to work on trees
Adapting PPM To Work On Trees
  • each node is a symbol
  • the context is path from root to the current node in the AST
  • problem: in DFS, what when we reach leaf node and go back up to ancestor?
    • pop context – all active nodes moved up one position to their parents (in context tree)
encoding
Encoding
  • PPM is used to model the choices made at choice nodes, i.e. associate a probability with every choice
  • these probabilities are used to drive an arithmetic coder to output bits
compressing constants
Compressing Constants
  • constants (strings, integers, names) are a significant fraction of source
  • to compress: make table of constants, and refer to them by their index in this table
  • further compress: maintain different tables for strings, names etc. – reduces number of bits in index
  • currently exploring more sophisticated context modeling ideas for compressing constants
ast compression example
AST Compression: Example

AST for: i = i + 1

Relevant grammar rules

Stmt = If|While|Assign|….

Assign = Lvalue Expr

Lvalue = Field|VarAccess|…

Expr = Unary|Binary|…

Binary = BinOp Expr Expr

Choice nodes

Preorder traversal

Stmt AssignLvalueVarAccess iExpr

Binary ExprVarAccess iExprLiteral

IntLiteral 1BinOp +

ast compression example1
AST Compression: Example

Context tree

AST for: i = i + 1

ast compression example2
AST Compression: Example

AST for: i = i + 1

Context tree

ast compression example3
AST Compression: Example

AST for: i = i + 1

Context tree

Model:

Prob(j) = 0.3

Prob(k) = 0.5

Prob(i) = 0.2

Send model and

choice “i” to

arithmetic coder

status and results
Status and Results
  • compressor/decompressor prototype written in Python
  • completely generic – can be used with any abstract grammar
  • have implemented the Java abstract grammar
    • works with single Java source files as well as entire packages.
  • comparison for Java class-file compression with Pugh’s results (best published Java compressor)
results classes
Results: Classes

Classes from Sun’s javac package - all sizes in bytes

results archives
Results: Archives

compressed collections of classes - all sizes in bytes

  • compressed ASTs are 5-50% smaller than Pugh’s
  • 3-8 times smaller than uncompressed class files or JAR files
performance enhancing information
Performance-Enhancing Information
  • now raise the semantic level of the grammar
  • e.g. “Escape Analysis”
    • an object that doesn’t “escape” its defining scope can be allocated on the stack rather than on the heap
    • this optimization alone can often double performance
  • the analysis itself is very difficult to do, but the results of the analysis are easy to verify
    • augment the type system by “escaping/non-escaping”
    • make this part of the encoding scheme itself
    • e.g., => a non-escaping object cannot be assigned to a variable from an enclosing scope
insights so far
Insights So Far
  • abstract syntax trees viable as a mobile code format
  • can be highly compressed
    • Java archives by factor of 3-8
    • 5-50% better than Java bytecode specific compression by Pugh
overall project achievements
Overall Project Achievements
  • lead the way to a genuine improvement over virtual machine transportation formats
    • security without need for validation
    • tamper-proof performance-improving information
  • innovative and generic program compression method as a useful by-product of this effort
task schedule
Task Schedule
  • Y1 Milestones:
  • source-level representation => Java compression
  • low-level representation
  • core calculus representation
  • Y2 Milestones:
  • system prototypes
  • trade-off analysis
  • encoding format comprehensive definition
  • End of Project:
  • system deliverable
  • comprehensive documentation

1999

2000

2001

2002

  • investigate:
  • multiple source languages
  • graph-based encoding schemes
  • proof-carrying code
  • investigate:
  • requirements ofoptimizing code generators
  • integration of security vs. compiler-related data
  • investigate:
  • mutual interaction of security, efficiency, and compression density
  • security of system
mobile code security revisited
Mobile Code Security Revisited
  • provided through type-safe programming language and type-safe APIs
    • semantically equivalent to transporting source code (everybody does it this way)
  • but many policies currently cannot be expressed in terms of a type system and hence need to be implemented inside the library
    • “open only files in directory X”
    • “initiate connections only with IP addresses in range […]”
    • “execute no more than N instructions between OS calls”
    • “do not send on network after reading local files”
    • => security automata
  • need to represent these properties directly and support them along the whole pipeline from code producer to code consumer
  • => some other PIs in Oasis are working on these themes and their work can be directly beneficial to this project
transition of technology
Transition of Technology
  • our prototype implementation(s) will be made available in source form
  • the idea is to create a “turnkey” replacement to current Java compilers and JVM runtime systems
    • you simply take your code and recompile using our compiler
    • it will then run on our runtime
    • our runtime will also run your old JVM class files
    • you can even mix our stuff with JVM class files
  • => we simply provide a new (better!) mobile code transportation layer without changing anything else
ad