summary statistics sumstats framework
Download
Skip this Video
Download Presentation
Summary Statistics (SumStats framework)

Loading in 2 Seconds...

play fullscreen
1 / 11

Summary Statistics (SumStats framework) - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

Summary Statistics (SumStats framework). What is SumStats?. Summary Statistics From Wikipedia: Summary statistics are used to summarize a set of observations, in order to communicate the largest amount as simply as possible. 2. Motivations. Load balancing made previous techniques all fail

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Summary Statistics (SumStats framework)' - meryl


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
what is sumstats
What is SumStats?
  • Summary Statistics
  • From Wikipedia:
    • Summary statistics are used to summarize a set of observations, in order to communicate the largest amount as simply as possible

2

motivations
Motivations
  • Load balancing made previous techniques all fail
    • SumStats framework hides cluster abstraction
  • Better and more repeatable interface and approach for measurement and thresholding
  • Give more people the ability to write real world deployable measurement scripts

3

approach
Approach
  • Discrete time slices (epochs)
  • Only streaming algorithms allowed
  • Every measurement must be merge-able for cluster support
  • Probabilistic data structures
    • HyperLogLog & Top-K now

4

why do any of this
Why do any of this?

Measurement is fun!

5

sumstats based notices
SumStats Based Notices

200.29.31.26 had 349 failed logins on 2 FTP servers in 14m47s

92.253.122.14 scanned at least 29 unique hosts on port 445/tcp in 1m4s

88.124.212.10 scanned at least 41 unique hosts on port 445/tcp in 1m13s

212.55.8.177 scanned at least 75 unique hosts on port 5900/tcp in 0m36s

200.30.130.101 scanned at least 66 unique hosts on port 445/tcp in 1m20s

107.22.92.186 scanned at least 64 unique hosts on port 443/tcp in 0m1s

5.254.140.123 scanned at least 29 unique hosts on port 102/tcp in 4m1s

122.211.164.196 scanned 15 unique ports of host 75.89.37.60 in 0m5s

6

slide7

Observation - type a

Observation - type a

Observation - type b

Observation - type a

Observation - type b

Reducer has: predicate, key normalizer

calculates: sum, avg, etc...

Reducer has: predicate, key normalizer

calculates: sum, avg, etc...

SumStat has: epoch, thresholds, epoch-end callbacks,

7

observations
Observations
  • Observations observe a single point of data
    • An HTTP request
    • A DNS lookup
    • An ICMP message

8

reducers
Reducers
  • Reducers collect observations and apply calculations to them
    • Sum of Content-Length headers
    • Unique number of DNS requests

9

sumstat
SumStat
  • A SumStat collates multiple reducers
    • Set thresholds at ratios between reducers
      • e.g. Ratio of unique DNS requests and unique HOST names seen in HTTP traffic
    • Handle results from Reducers and do something

10

slide11

Observation - type a

Observation - type a

Observation - type b

Observation - type a

Observation - type b

Reducer has: predicate, key normalizer

calculates: sum, avg, etc...

Reducer has: predicate, key normalizer

calculates: sum, avg, etc...

SumStat has: epoch, thresholds, epoch-end callbacks,

11

ad