1 / 12

Real Time Streaming Pattern Detection for eCommerce

Real Time Streaming Pattern Detection for eCommerce. AUTHORS - William Braik , Floréal Morandat , Jean-Rémy Falleri , Xavier Blanc PRESENTED BY KRITI NARSAPUR (Student id: 1294630). contents. Introduction Background Pattern Detection Evaluation Conclusion. introduction.

allensteven
Download Presentation

Real Time Streaming Pattern Detection for eCommerce

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real Time Streaming Pattern Detection for eCommerce AUTHORS - William Braik, FloréalMorandat, Jean-Rémy Falleri, Xavier Blanc PRESENTED BY KRITI NARSAPUR (Student id: 1294630)

  2. contents • Introduction • Background • Pattern Detection • Evaluation • Conclusion

  3. introduction • Pattern detection over event streams • Challenges of real time pattern detection • Efficiency • Scalability • Existing approach – measure web traffic in batch fashion

  4. Introduction contd.. • Experimented approach: • Domain Specific Language (DSL) – express customers’ behaviours • DSL semantics – compilation process transforms patterns into Deterministic Finite Automata (DFAs) • Spark – Big Data streaming platform – run pattern detection algorithm in real time • cDiscount Requirement: • Handle customers’ behaviours detection – 1million customers send around 400 events each day – latency < 1 sec

  5. BackgroundCdiscount architecture • Event of stream, e = (t, d)

  6. Pattern detection:A DSL to express behaviour pattern • Patterns: sequences of events. • Event is matched according to its action type • DSL also supports complement of action type • 2 non-contiguous operators, that ignores all events that do not match the pattern : • FollowedBy • KleenePlus+ • Time constraints : Interval (operator) and Window (pattern) • Data constraint • Negative Acceptation Condition (NAC)

  7. Pattern detection:From patterns to automata • DFAs are used to detect patterns (NFA is just used for representation) • Translate each pattern to corresponding DFA • Run DFA for each customer • Memory usage is proportional to number of simultaneous customers * number of patterns to detect • 2 step transformations: • Generate NFA • Convert NFA into corresponding DFA • Run the pattern detection using Spark P : View + Exit

  8. evaluation • Goal – assess whether a given cluster of machines, with a set pool of memory and CPU resources is capable of detecting patterns efficiently • Total number of automata that this engine runs, A = C * P • C : number of simultaneous customers handled by the system • P : number of patterns to be observed • Throughput of events, T = C * E • E : number of events • Measure maximum value of A and T that are supported by given cluster

  9. Evaluation contd..protocol • Maximum of A – simulate creation of new automata, until performance criterion is not met. • Maximum of T – generate stream with a given throughput and check how system performs under stress • Phase 1 – Creates as many automata as needed to reach A • Measure memory footprint • Phase II – Keeps system working, without increasing number of automata in memory • Measure maximum detection latency

  10. Evaluation contd..results • Run total 30 configurations: • T ϵ {1000, 2500, 5000, 10000, 15000, 20000, 30000, 35000, 40000} • A ϵ { 0.5M,1M,2M } • Phase 1 – Latency Detection increases • Phase II – Curve stabilizes T = 5000 events per second

  11. conclusion • This study provided • DSL – expressing behaviour patterns • Compiler – translate them into DFA • Detection engine • Experimental results showed that, for 5000 events per second, it can handle: • 1million customers with subsecond detection latency • 2millions with latency lower than 2 seconds

  12. THANK YOU

More Related