1 / 25

Hyperion: High Volume Stream Archival for Restrospective Querying

Hyperion: High Volume Stream Archival for Restrospective Querying. Peter Desnoyers and Prashant Shenoy University of Massachusetts. Packet monitoring with history. Packet monitor : capture and search packet headers E.g.: Snort, tcpdump, Gigascope … with history :

hewitt
Download Presentation

Hyperion: High Volume Stream Archival for Restrospective Querying

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers and Prashant Shenoy University of Massachusetts

  2. Packet monitoring with history • Packet monitor: capture and search packet headers • E.g.: Snort, tcpdump, Gigascope • … with history: • Capture, index, and store packet headers • Interactive queries on stored data • Provides new capabilities: • Network forensics: • When was a system compromised? From where? How? • Management: • After-the-fact debugging monitor storage

  3. Challenges • Speed • Storage rate, capacity • to store data without loss, retain long enough • Queries • must search millions of packet records • Indexing in real time • for online queries • Commodity hardware 1 gbit/s x 80% ÷ 400 B/pkt = 250,000 pkts/s For each linkmonitored

  4. Existing approaches Packet monitoring with history requires a new system. *Niksun NetDetector, Sandstorm NetInterceptor

  5. Outline of talk • Introduction and Motivation • Design • Implementation • Results • Conclusions

  6. Hyperion Design • Multiple monitor systems • High-speed storage system • Local index • Distributed index for query routing Distributedindex Monitor/capture Index Hyperion node Storage

  7. Storage Requirements • Real-time • Writes must keep up or data is lost • Prioritized • Reads shouldn’t interfere with writes • Aging • Old data replaced by new • Stream storage • Different behavior Behavior:Typical app. vs. Hyperion Packet monitoring is different from typical applications

  8. 2: C 3: A 4: B Log structured stream storage • Goal: minimize seeks • despite interleaved writes on multiple streams • Log-structured file system minimizes seeks • Interleave writes at advancing frontier • free space collected by segment cleaner disk position C A C A B Write frontier 1: A • But: • General-purpose segment cleaner performs poorly on streams

  9. skip Hyperion StreamFS • How to improve on a general-purpose file system? • Rely on application use patterns • Eliminate un-needed features • StreamFS – log structure with no segment cleaner. • No deletes (just over-write) • No fragmentation • No segment cleaning overhead • Operation: • Write fixed-size segment • Advance write frontier to next segment ready for deletion

  10. StreamFS Design • Record • Single write, packed into: • Segment • Fixed-size, single stream, interleaved into: • Region • Contains: • Region map • Identifies segments in region • Used when write frontier wraps • Directory • Locate streams on disk record segment region Region map directory Stream_A …

  11. StreamFS optimizations New data • Data Retention • Control how much history saved • Lets filesystem make delete decisions • Speed balancing • Worst-case speed set by slowest tracks • Solution: interleave fast and slow sections • Worst-case speed now set by average track Reservation Old data is deleted

  12. Local Index • Requirements: • High insertion speed • Interactive query response Index and search mechanisms

  13. Signature Keys Records Signature Index • Compress data into signature • Store signature separately • Search signature, not data • Retrieve data itself on match • Signature algorithm: Bloom filter • No false negatives – never misses a result • False positives – extra read overhead

  14. Bytes searched Index size Signature index efficiency Overhead = bytes searched • Index size • False positives (data scan) • Concise index: • Index scan cost: low • False positive scans: high • Verbose index: • Index scan cost: high • False positive scans: low

  15. Multi-level signature index • Concise index: • Low scan overhead • Verbose index: • Low false positive overhead • Use both • Scan concise index • Check positives in verbose index Concise index Verbose index Data records

  16. Distributed Index • Query routing: • Send queries only to nodes holding matches • Use signature index • Index distribution: • Aggregate indexes at cluster head • Route queries through cluster head • Rotate cluster head for load sharing

  17. Implementation • Components: • StreamFS • Index • Capture • RPC, query & index distribution • Query API • Linux OS • Python framework RPC, query, index dist. QueryAPI Index StreamFS capture Linux kernel Hyperion components

  18. Outline of talk • Introduction and Motivation • Design • Implementation • Results • Conclusions

  19. Experimental Setup • Hardware: • Linux cluster • Dual 2.4GHz Xeon CPUs • 1 GB memory • 4 x 10K RPM SCSI disks • Syskonnect SK98xx + U. Cambridge driver • Test data • Packet traces from UMass Internet gateway* • 400 mbit/s, 100k pkt/s *http://traces.cs.umass.edu

  20. StreamFS – write performance • Tested configurations: • NetBSD / LFS • Linux / XFS (SGI) • StreamFS • Workload: • multiple streams, rates • Logfile rotation • Used for LFS, XFS • Results: • 50% boost in worst-case throughput • Fast enough to store 1,000,000 packet hdrs/s

  21. StreamFS – read/write • Workload: • Continuous writes • Random reads • StreamFS: • sustained write throughput • XFS • throughput collapse StreamFS can handlestream read+write traffic without data loss. XFS cannot.

  22. Index Performance • Calculation benchmark: • 250,000 pkts/sec • Query: • 380M packet headers • 26GB data • selective query (1 pkt returned) • Query results: • 13MB data fetched to query 26GB data (1:2000) Data fetched (MB) Index size

  23. Results: Packets/s Loss rate 110,000 0 130,000 0 150,000 2·10-6 160,000 4·10-6 175,000 10·10-6 200,000 .001 System Performance • Workload: • Trace replay • Simultaneous queries • Speed:100-200K pkts/s • Packet loss measured: • #transmitted - #received Up to 175K pkts/s with negligible packet loss

  24. Conclusions Hyperion - packet monitoring with retrospective queries Key components: • Storage • 50% improvement over GP file systems • Index • Insert at 250K pkts/sec • Interactive query over 100s of millions of pkts System • Capture, index, and query at 175K pkts/sec

  25. Questions Questions?

More Related