Practical data confinement
Download
1 / 60

Practical Data Confinement - PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on

Practical Data Confinement. Andrey Ermolinskiy, Sachin Katti, Scott Shenker, Lisa Fowler, Murphy McCauley. Introduction. Controlling the flow of sensitive information is one of the central challenges in managing an organization Preventing exfiltration (theft) by malicious entities

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Practical Data Confinement' - gella


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Practical data confinement

Practical Data Confinement

Andrey Ermolinskiy, Sachin Katti, Scott Shenker, Lisa Fowler, Murphy McCauley


Introduction
Introduction

  • Controlling the flow of sensitive information is one of the central challenges in managing an organization

    • Preventing exfiltration (theft) by malicious entities

    • Enforcing dissemination policies



Why is it so hard to secure sensitive data1
Why is it so hard to secure sensitive data?

  • Modern software is rife with security holes that can be exploited for exfiltration


Why is it so hard to secure sensitive data2
Why is it so hard to secure sensitive data?

  • Modern software is rife with security holes that can be exploited for exfiltration

  • Users must be trusted to remember, understand, and obey dissemination restrictions

    • In practice, users are careless and often inadvertently allow data to leak

      • E-mail sensitive documents to the wrong parties

      • Transfer data to insecure machines and portable devices


Our goal
Our Goal

  • Develop a practical data confinement solution


Our goal1
Our Goal

  • Develop a practical data confinement solution

  • Key requirement: compatibility with existing infrastructure and patterns of use

    • Support current operating systems, applications, and means of communication

      • Office productivity apps: word processing, spreadsheets, …

      • Communication: E-mail, IM, VoIP, FTP, DFS, …

    • Avoid imposing restrictions on user behavior

      • Allow access to untrusted Internet sites

      • Permit users to download and install untrusted applications


Our assumptions and threat model
Our Assumptions and Threat Model

  • Users

    • Benign, do not intentionally exfiltrate data

    • Make mistakes, inadvertently violate policies

  • Software platform (productivity applications and OS)

    • Non-malicious, does not exfiltrate data in pristine state

    • Vulnerable to attacks if exposed to external threats

  • Attackers

    • Malicious external entities seeking to exfiltrate sensitive data

    • Penetrate security barriers by exploiting vulnerabilities in the software platform


Central design decisions
Central Design Decisions

  • Policy enforcement responsibilities

    • Cannot rely on human users

    • The system must track the flow of sensitive information, enforce restrictions when the data is externalized


Central design decisions1
Central Design Decisions

  • Policy enforcement responsibilities

    • Cannot rely on human users

    • The system must track the flow of sensitive information, enforce restrictions when the data is externalized

  • Granularity of information flow tracking (IFT)

    • Need fine-grained byte-level tracking and policy enforcement to prevent accidental partial exfiltrations


Central design decisions2
Central Design Decisions

  • Placement of functionality

    • PDC inserts a thin software layer (hypervisor) between the OS and hardware

    • The hypervisor implements byte-level IFT and policy enforcement

    • A hypervisor-level solution

      • Retains compatibility with existing OSes and applications

      • Has sufficient control over hardware


Central design decisions3
Central Design Decisions

  • Placement of functionality

    • PDC inserts a thin software layer (hypervisor) between the OS and hardware

    • The hypervisor implements byte-level IFT and policy enforcement

    • A hypervisor-level solution

      • Retains compatibility with existing OSes and applications

      • Has sufficient control over hardware

  • Resolving tension between safety and user freedom

    • Partition the application environment into two isolated components: a “Safe world” and a “Free world”


Partitioning the user environment
Partitioning the User Environment

Safe Virtual Machine

Unsafe Virtual Machine

Access to sensitive data

Unrestricted communication and execution of untrusted code

Hypervisor

IFT, policy enforcement

Hardware (CPU, Memory, Disk, NIC, USB, Printer, …)


Partitioning the user environment1
Partitioning the User Environment

Sensitive data

Non-sensitive data

Trusted code/data

Exposure to the threat of exfiltration

Untrusted (potentially malicious) code/data


Pdc use cases
PDC Use Cases

  • Logical “air gaps” for high-security environments

    • VM-level isolation obviates the need for multiple physical networks

  • Preventing information leakage via e-mail

    • “Do not disseminate the attached document”

  • Digital rights management

    • Keeping track of copies; document self-destruct

  • Auto-redaction of sensitive content


Talk outline
Talk Outline

  • Introduction

  • Requirements and Assumptions

  • Use Cases

  • PDC Architecture

  • Prototype Implementation

  • Preliminary Performance Evaluation

  • Current Status and Future Work


Pdc architecture hypervisor
PDC Architecture: Hypervisor

  • PDC uses an augmented hypervisor to

    • Ensure isolation between safe and unsafe VMs

    • Tracks the propagation of sensitive data in the safe VM

    • Enforces security policy at exit points

      • Network I/O, removable storage, printer, etc.


Pdc architecture tag tracking in the safe vm
PDC Architecture: Tag Tracking in the Safe VM

  • PDC associates an opaque 32-bit sensitivity tag with each byte of virtual hardware state

    • User CPU registers accessible

    • Volatile memory

    • Files on disk


Pdc architecture tag tracking in the safe vm1
PDC Architecture: Tag Tracking in the Safe VM

  • These tags are viewed as opaque identifiers

  • The semantics can be tailored to fit the specific needs of administrators/users

  • Tags can be used to specify

    • Security policies

    • Levels of security clearance

    • High-level data objects

    • High-level data types within an object


Pdc architecture tag tracking in the safe vm2
PDC Architecture: Tag Tracking in the Safe VM

  • An augmented x86 emulator performs fine-grained instruction-level tag tracking (current implementation is based on QEMU)

  • PDC tracks explicit data flows (variable assignments, arithmetic operations)

eax

add %eax, %ebx

ebx


Pdc architecture tag tracking in the safe vm3
PDC Architecture: Tag Tracking in the Safe VM

  • An augmented x86 emulator performs fine-grained instruction-level tag tracking (current implementation is based on QEMU)

  • PDC also tracks flows resulting from pointer dereferencing

eax

Tag merge

mov %eax, %(ebx)

ebx

Memory


Challenges
Challenges

  • Tag storage overhead in memory and on disk

    • Naïve implementation would incur a 400% overhead

  • Computational overhead of online tag tracking

  • Tag explosion

    • Tag tracking across pointer exacerbates the problem

  • Tag erosion due to implicit flows

  • Bridging the semantic gap between application data units and low-level machine state

  • Impact of VM-level isolation on user experience


Talk outline1
Talk Outline

  • Introduction

  • Requirements and Assumptions

  • Use Cases

  • PDC Architecture

  • Prototype Implementation

    • Storing sensitivity tags in memory and on disk

    • Fine-grained tag tracking in QEMU

    • “On-demand” emulation

    • Policy enforcement

  • Performance Evaluation

  • Current Status and Future Work


Pdc implementation the big picture

Policy daemon

PDC Implementation: The Big Picture

Dom 0

Safe VM

QEMU / tag tracker

App1

App2

Safe VM (emulated)

PageTag

Descriptors

NIC

Network daemon

VFS

PDC-ext3

NFS Server

NFS Client

Xen-RPC

Xen-RPC

Event channel

Shadow page tables

Safe VM page tables

Shared ring buffer

PageTag

Mask

PDC-Xen (ring 0)

CR3

CPU


Storing tags in volatile memory
Storing Tags in Volatile Memory

  • PDC maintains a 64-bit PageTagSummary for each page of machine memory

  • Uses a 4-level tree data structure to keep PageNumberPageTagSummary mappings

31

29

19

9

0

PageNumber

Array of 64-bit PageTagSummarystructures


Storing tags in volatile memory1
Storing Tags in Volatile Memory

Page-wide tag for uniformly-tagged pages

PageTagSummary

  • PageTagDescriptor stores fine-grained (byte-level) tags within a page in one of two formats

Pointer to a PageTagDescriptor otherwise

Linear array of tags (indexed by page offset)

PageTagDescriptor

RLE encoding


Storing tags on disk
Storing Tags on Disk

  • PDC-ext3 provides persistent storage for the safe VM

  • New i-node field for file-level tags

  • Leaf indirect blocks store pointers to BlockTagDescriptors

  • BlockTagDescriptor byte-level tags within a block

Data block

i-node

FileTag

Linear array

Leaf Ind. block

BlockTagDescriptor

Ind. block

RLE


Back to the big picture

Policy daemon

Back to the Big Picture

Dom0

Safe VM

QEMU / tag tracker

App1

App2

Emul. CPU

Context

Safe VM (emulated)

NIC

Network daemon

VFS

PDC-ext3

NFS Server

NFS Client

Xen-RPC

Xen-RPC

Event channel

Shadow page tables

Safe VM page tables

Shared ring buffer

PageTag

Mask

PDC-Xen (ring 0)

CR3

CPU


Fine grained tag tracking

Guest machine codeblock (x86)

Fine-Grained Tag Tracking

  • A modified version of QEMU emulates the safe VM and tracks movement of sensitive data

  • QEMU relies on runtime binary recompilation to achieve reasonably efficient emulation

  • We augment the QEMU compiler to generate a tag tracking instruction stream from the input stream of x86 instructions

Intermediate representation (TCG)

stage 2

Host machine code block (x86)

stage 1

Tag tracking code block


Fine grained tag tracking1
Fine-Grained Tag Tracking

  • Tag tracking instructions manipulate the tag status of emulated CPU registers and memory

Basic instruction format

Action

Dest. Operand

Src. Operand

{Clear, Set, Merge}

{Reg, Mem}

{Reg, Mem}

  • The tag tracking instruction stream executes asynchronously in a separate thread


Fine grained tag tracking2
Fine-Grained Tag Tracking

  • Problem: some of the instruction arguments are not known at compile time

    • Example: mov %eax,(%ebx)

    • Source memory address is not known

  • The main emulation thread writes the values of these arguments to a temporary log (a circular memory buffer) at runtime

  • The tag tracker fetches unknown values from this log


Binary recompilation example
Binary Recompilation (Example)

Input x86 instructions

Intermediate representation

Tag tracking instructions

mov %eax, $123

movi_i32 tmp0,$123

Clear4 eax

st_i32 tmp0,env,$0x0

push %ebp

ld_i32 tmp0,env,$0x14

Set4 mem,ebp,0

ld_i32 tmp2,env,$0x10

Merge4 mem,esp,0

movi_i32 tmp14, $0xfffffffc

add_i32 tmp2,tmp2,tmp14

qemu_st_logaddr tmp0,tmp2

st_i32 tmp2,env,$0x10

MachineAddr(%esp)

Tag tracking argument log


Binary recompilation
Binary Recompilation

  • But things get more complex…

    • Switching between operating modes (Protected/real/virtual8086, 16/32bit)


Binary recompilation1
Binary Recompilation

  • But things get more complex…

    • Switching between operating modes (Protected/real/virtual8086, 16/32bit)

    • Recovering from exceptions in the middle of a translation block


Binary recompilation2
Binary Recompilation

  • But things get more complex…

    • Switching between operating modes (Protected/real/virtual8086, 16/32bit)

    • Recovering from exceptions in the middle of a translation block

    • Multiple memory addressing modes


Binary recompilation3
Binary Recompilation

  • But things get more complex…

    • Switching between operating modes (Protected/real/virtual8086, 16/32bit)

    • Recovering from exceptions in the middle of a translation block

    • Multiple memory addressing modes

    • Repeating instructions

      rep movs


Binary recompilation4
Binary Recompilation

  • But things get more complex…

    • Switching between operating modes (Protected/real/virtual8086, 16/32bit)

    • Recovering from exceptions in the middle of a translation block

    • Multiple memory addressing modes

    • Repeating instructions

      rep movs

    • Complex instructions whose semantics are partially determined by the runtime state

saved SS

saved ESP

saved EFLAGS

saved CS

iret

saved EIP


Back to the big picture1

Policy daemon

Back to the Big Picture

Dom0

Safe VM

QEMU / tag tracker

App1

App2

Emul. CPU

Context

Safe VM (emulated)

NIC

Network daemon

VFS

PDC-ext3

NFS Server

NFS Client

Xen-RPC

Xen-RPC

Event channel

Shadow page tables

Safe VM page tables

Shared ring buffer

PageTag

Mask

PDC-Xen (ring 0)

CR3

CPU


On demand emulation
“On-Demand” Emulation

  • During virtualized execution, PDC-Xen uses the paging hardware to intercept sensitive data access

  • Maintains shadow page tables, in which all memory pages containing tagged data are marked as not present

QEMU / tag tracker

PageTag

Descriptors

PageTag

Mask

  • Access to a tagged page from the safe VM causes a page fault and transfer of control to the hypervisor

Shadow page tables

Safe VM page tables

PDC-Xen (ring 0)


On demand emulation1
“On-Demand” Emulation

  • If the page fault is due to tagged data, PDC-Xen suspends the guest domain and transfers control to the emulator (QEMU)

  • QEMU initializes the emulated CPU context from the native processor context (saved upon entry to the page fault handler) and resumes the safe VM in emulated mode

Dom0

Safe VM

QEMU / tag tracker

Access to a tagged page

Safe VM

memory

mappings

Emul. SafeVM

CPU

Page fault handler

Dom0 VCPU

Dom0 Memory

SafeVM VCPU

Safe VM Memory


On demand emulation2
“On-Demand” Emulation

  • Returning from emulated execution

    • QEMU terminates the main emulation loop, waits for the tag tracker to catch up

    • QEMU then makes a hypercall to PDC-Xen and provides

      • Up-to-date processor context for the safe VM VCPU

      • Up-to-date PageTagMask


On demand emulation3
“On-Demand” Emulation

  • Returning from emulated execution

    • QEMU terminates the main emulation loop, waits for the tag tracker to catch up

    • QEMU then makes a hypercall to PDC-Xen and provides

      • Up-to-date processor context for the safe VM VCPU

      • Up-to-date PageTagMask

    • The hypercall awakens the safe VM VCPU (blocked in the page fault handler)

    • The page fault handler

      • Overwrites the call stack with up-to-date values of CS/EIP, SS/ESP, EFLAGS

      • Restores other processor registers

      • Returns control to the safe VM



On demand emulation challenges1
“On-Demand” Emulation - Challenges

  • Updating PTEs in read-only page table mappings

    • Solution: QEMU maintains local writable “shadow” copies, synchronizes them in background via hypercalls


On demand emulation challenges2
“On-Demand” Emulation - Challenges

  • Updating PTEs in read-only page table mappings

    • Solution: QEMU maintains local writable “shadow” copies, synchronizes them in background via hypercalls

  • Transferring control to the hypervisor during emulated execution (hypercall and fault handlers)

    • Emulating hypervisor-level code is not an option

    • Solution: Transient switch to native execution

      • Resume native execution at the instruction that causes a jump to the hypervisor (e.g., int 0x82 for hypercalls)


On demand emulation challenges3
“On-Demand” Emulation - Challenges

  • Delivery of timer interrupts (events) in emulated mode

    • The hardware clock advances faster in the emulated context (i.e., each instruction consumes more clock cycles)

    • Xen needs to scale the delivery of timer events accordingly


On demand emulation challenges4
“On-Demand” Emulation - Challenges

  • Delivery of timer interrupts (events) in emulated mode

    • The hardware clock advances faster in the emulated context (i.e., each instruction consumes more clock cycles)

    • Xen needs to scale the delivery of timer events accordingly

  • Use of the clock cycle counter (rdtsc instruction)

    • Linux timer interrupt/event handler uses the clock cycle counter to estimate timer jitter

    • After switching from emulated to native execution, the guest kernel observes a sudden jump forward in time


Policy enforcement
Policy Enforcement

  • The policy controller module

    • Resides in dom0 and interposes between the front-end and the back-end device driver

    • Fetches policies from a central policy server

    • Looks up the tags associated with the data in shared I/O request buffers and applies policies

Dom0

Safe VM

Netw. interface

back-end

Netw. Interface

front-end

Block storage

back-end

Block storage

front-end

Policy

controller


Network communication
Network Communication

  • PDC annotates outgoing packets with PacketTagDescriptors, carrying the sensitivity tags

  • Current implementation transfers annotated packets via a TCP/IP tunnel

EthHdr

IPHdr

TCPHdr

Payload

Annotation TCP/IP encapsulation

EthHdr

IPHdr

TCPHdr

Tags

EthHdr

IPHdr

TCPHdr

Payload


Talk outline2
Talk Outline

  • Introduction

  • Requirements and Assumptions

  • Use Cases

  • PDC Architecture

  • Prototype Implementation

  • Preliminary Performance Evaluation

    • Application-level performance overhead

    • Filesystem performance overhead

    • Network bandwidth overhead

  • Current Status and Future Work


Preliminary performance evaluation
Preliminary Performance Evaluation

  • Experimental setup:

    • Quad-core AMD Phenom 9500, 2.33GHz, 3GB of RAM

    • 100Mbps Ethernet

    • PDC Hypervisor based on Xen v.3.3.0

    • Paravirtualized Linux kernel v.2.6.18-8

    • Tag tracker based on QEMU v.0.10.0


Application level overhead
Application-Level Overhead

  • Goal: estimate the overall performance penalty (as perceived by users) in realistic usage scenarios

  • First scenario: recursive text search within a directory tree (grep)

    • Input dataset: 1GB sample of the Enron corporate e-mail database (http://www.cs.cmu.edu/~enron)

    • We mark a fraction (F) of the messages as sensitive, assigning them uniform sensitivity tag

    • We search the dataset for a single-word string and measure the overall running time


Application level overhead1
Application-Level Overhead

PDC-Xen, paravirt. Linux, tag tracking

Standard Xen, paravirt. Linux

Linux on “bare metal”

F (%)


Filesystem performance overhead
Filesystem Performance Overhead

  • Configurations:

    • C1 – Linux on “bare metal”; standard ext3

    • C2 – Xen, paravirt. Linux; dom0 exposes a paravirt. block device; Guest domain mounts it as ext3

    • C3 – Xen, paravirt. Linux; dom0 exposes ext3 to the guest domain via NFS/TCP

    • C4 – Xen, paravirt. Linux; dom0 exposes ext3 to the guest domain via NFS/Xen-RPC

    • C5– Xen, paravirt. Linux; dom0 exposes PDC-ext3 to the guest domain via NFS/Xen-RPC

  • First experiment: sequential file write throughput

    • Create a file  write 1GB of data sequentially  close  sync


Filesystem performance overhead1
Filesystem Performance Overhead

  • Configurations:

    • C1 – Linux on “bare metal”; standard ext3

    • C2 – Xen, paravirt. Linux; dom0 exposes a paravirt. block device; Guest domain mounts it as ext3

    • C3 – Xen, paravirt. Linux; dom0 exposes ext3 to the guest domain via NFS/TCP

    • C4 – Xen, paravirt. Linux; dom0 exposes ext3 to the guest domain via NFS/Xen-RPC

    • C5– Xen, paravirt. Linux; dom0 exposes PDC-ext3 to the guest domain via NFS/Xen-RPC


Filesystem performance overhead2
Filesystem Performance Overhead

  • Second experiment: Metadata operation overhead

    • M1: Create a large directory tree (depth=6, fanout=6)

    • M2: Remove the directory tree created by M1 (rm –rf *)


Network bandwidth overhead
Network Bandwidth Overhead

  • We used iperf to measure end-to-end bandwidth between a pair of directly-connected hosts

  • Configurations:

    • NC1 – No packet interception

    • NC2 – Interception and encapsulation

    • NC3 –Interception, encapsulation, and annotation with sensitivity tags

      • Sender assigns sensitivity tags to a random sampling of outgoing packets

      • We vary two parameters: Tag Prevalence (P) and Tag Fragmentation (F)



Performance evaluation summary
Performance Evaluation - Summary

  • Application performance in the safe VM

    • 10x slowdown in the worst-case scenario

    • We expect to reduce this overhead significantly through a number of optimizations

  • Disk and network I/O overhead

    • Proportional to the amount sensitive data and the degree of tag fragmentation

    • 4x overhead in the worst-case scenairo (assuming 32-bit tag identifiers)


Summary and future work
Summary and Future Work

  • PDC seeks a practical solution to the problem of data confinment

    • Defend against exfiltration by outside attackers

    • Prevent accidental policy violations

  • Hypervisor-based architecture provides mechanisms for isolation, information flow tracking, and policy enforcement

  • Currently working on

    • Improving stability and performance of the prototype

    • Studying the issue of taint explosion in Windows and Linux environments and its implications on PDC


ad