260 likes | 273 Views
Discover how retrospection technology enables efficient search, debugging, legal compliance, and malware tracking for VM images while minimizing code constraints.
E N D
Retrospecting VM Images Wolfgang Richter Glenn Ammons*, Jan Harkes, Adam Goode, Vasanth Bala*, Nilton Bila+, Eyal de Lara+, Mahadev Satyanarayan * IBM Research + University of Toronto
The Importance of Content http://www.pdl.cmu.edu/
Retrospection http://www.pdl.cmu.edu/ • VM collections growing • 300% Year over year, IBM Research RC2 • System, application, and user content • Searchable history • Debugging opportunities • Legal data or code origin • Malware tracking • License violations
Roadmap • What is the retrospection problem? • What are the main challenges? • How can we solve them? http://www.pdl.cmu.edu/
The retrospection mechanism should place as few constraints as possible on the code used for search computations. Principle 1 http://www.pdl.cmu.edu/
A search computation should only be performed on demand for a specific query, and its scope should be restricted to the smallest relevant subset of VM images. Principle 2 http://www.pdl.cmu.edu/
Control of policy for retrospection should reside with the owners of VM images and their delegates. Principle 3 - wip http://www.pdl.cmu.edu/
Find The Picture OK OK ? http://www.pdl.cmu.edu/ Rich content-based and application-specific queries 10 Graphics 100 Graphics 1000+ Graphics
OpenDiamond Platform Principle 1 The retrospection mechanism should place as few constraints as possible on the code used for search computations. http://www.pdl.cmu.edu/ • Distributed, interactive, unindexed search • Focuses on the principle of early discard • Enables arbitrary search queries • Arbitrary x86 binary code
Available Structured Data http://www.pdl.cmu.edu/ • VM’s have attributes and metadata • Owners • Files • File Systems • Files have attributes and metadata • Owners • File Type • Permissions • Modification Timestamp
Scoping Solution Principle 2 A search computation should only be performed on demand for a specific query, and its scope should be restricted to the smallest relevant subset of VM images. http://www.pdl.cmu.edu/ • Metadata MySQL database • Scope Server • Manage access to data • Scope Cookie • X.509 signed cookie • Determines accessible data
Problem: VM Sprawl in Files Data from 78 NCSU VCL VM Images based on Windows XP http://www.pdl.cmu.edu/
Problem: VM Sprawl in Bytes Data from 78 NCSU VCL VM Images based on Windows XP http://www.pdl.cmu.edu/
Solution: Deduplication in Files Reduce Search Time Data from 78 NCSU VCL VM Images based on Windows XP http://www.pdl.cmu.edu/
Solution: Deduplication in Bytes Reduce Storage Space Data from 78 NCSU VCL VM Images based on Windows XP http://www.pdl.cmu.edu/
IBM Research Mirage http://www.pdl.cmu.edu/ • File-level deduplication • Files are referenced by SHA-1 tag • Reads VM image partitions and file systems • On-disk deduplicated format • Centralized VM store – a potential bottleneck
Network Bottlenecks Megabytes Takeaway: Centralized store limited by network bandwidth, limiting parallelism. http://www.pdl.cmu.edu/
Network Bottlenecks Objects Takeaway: The number of objects pushed determines the possible number of search processes. http://www.pdl.cmu.edu/
Dataretriever http://www.pdl.cmu.edu/ • Abstract data source • getObject() interface • Search process oblivious to where data comes from • Access deduplicated data • Unmodified client and search server • Solve network bottleneck: Data partitioning • Compute on local data rather than central store • Layer of indirection enables this without modification
Architecture Scope Cookie Scope Definition Scope Mirage Request Objects Server Client Raw Data Dataretriever Dataretriever Dataretriever Server Metadata Query Query+Cookie Server MySQL http://www.pdl.cmu.edu/
Revisiting Network Bottlenecks How bad is the bottleneck with content-based queries? http://www.pdl.cmu.edu/
CPU-Bound Search Process Takeaway: Content search is limited by computation, although embarrassingly parallel. http://www.pdl.cmu.edu/
Achievable Efficient Retrospection Takeaway: Search scales with servers, and the Mirage case closely matches local. http://www.pdl.cmu.edu/
Current Research: Principle 2 http://www.pdl.cmu.edu/ • Control of policy to owners via encryption • Proof of concept: convergent encrypt /home • Encrypt files using file hash as key • Fine-grained, per file • Future direction: key escrow? • Support investigations and warrants • Support multiple encryption methods? • Per VM Image? Groups?
Recap http://www.pdl.cmu.edu/ • Retrospection – search VM image content • Main challenges • Get data efficiently • Solution: Dataretriever • Handle big and growing data • Solution: Scoping + Deduplication • Privacy and encryption
OpenDiamond - http://diamond.cs.cmu.edu IBM Mirage - http://doi.acm.org/10.1145/1346256.1346272 Convergent Encryption - http://doi.acm.org/10.1145/339331.339345 LEARN MORE http://www.pdl.cmu.edu/