1 / 31

Mining Logs for Long-Term Patterns

Mining Logs for Long-Term Patterns. Boris Novikov , Elena Michailova , Ekaterina Ivannikova , Alice Pigul Saint Petersburg University. Motivation. Goal: To predict system behavior based on some knowledge from data Solution: Short-term patterns Long-term patterns Applicability:

hanzila
Download Presentation

Mining Logs for Long-Term Patterns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Logs for Long-Term Patterns Boris Novikov, Elena Michailova, Ekaterina Ivannikova, Alice Pigul Saint Petersburg University

  2. Motivation Goal: • To predict system behavior based on some knowledge from data Solution: • Short-term patterns • Long-term patterns Applicability: • Storage system performance • Financial market movies • Geographical – ecological prediction

  3. Some Data are Needed System logs: • Low-level system I/O and other storage system logs • Application logs (e.g. DBMS performance monitoring) • Transaction logs Problems: • Huge volume • Hard to get realistic data

  4. The Data • A production DB for a medium size company • The business area: sea transportation • DBMS: Oracle 10g • Database size: approx. 90 Gb (operational) • Query execution statistics • Only summary data were available

  5. The Data Structure • The fields: • SQL id • Elapsed time • Executions • CPU • Start interval time • End interval time • The data are aggregated on 1 hour intervals • Query IDs but not SQL text

  6. Pattern • Queries • indicate business functions; • the links to data may be found via query parameters. • Group of queries • might indicate business processes. • A pattern is a set of queries with significant resource consumption at the same or close intervals

  7. Why it is Hard? • Several business processes are interactive and hence chaotic • The processes might be too small or large

  8. Typical Workload

  9. Algorithm • Preparing data • Looking for Patterns • Finding Periods • Validation

  10. Cleaning • Remove queries which are not helpful but produce significant workload • Chaotic queries with high intensity • Uniformly distributed queries • Trivial periods (working hours) • Anomalies (occasional very high workload)

  11. Some examples - Anomaly

  12. Some examples - Period

  13. Some examples – Trivial periods

  14. How to Clean? Queries producing • Nearly constant workload for (almost all) snapshots • Nearly constant ratio to total workload for the snapshot • Anomalies Approach • Variance • Frequency of occurrence

  15. Removing Chaotic Queries Minimum variance is 0.5, drop no more 9% queries

  16. Algorithm • Preparing data • Looking for Patterns • Finding Periods • Validation

  17. Mining Patterns • Patterns: • related queries are grouped together. • The number of patterns depends on: • number of queries in a group; • correlation measure.

  18. The interconnectedness of the data • Between queries q1 and q2 • Adding a query q to a group G • Delete all new groups, when: M < threshold

  19. The Impact of the Threshold M >0.6, it is corresponds to more than 4/5 snapshots

  20. Example of correlated group

  21. Algorithm • Preparing data • Looking for Patterns • Finding Periods • Validation

  22. Periods • Transform patterns to the binary sequence: If (q1,q2,..., qn є snapsoti) then Binary[i] = 1 else Binary[i] = 0 • Cycle c = {p,o}, whereрis period and оis offset. Example: ship arrive to the port every Friday P = 7 days, o = 5th day, cycle c = {7,5}

  23. Algorithms • Exact periods • Approximate periods: allow missing or extra entries

  24. Exact periods mining • Detection of cycles with the 100% support0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 – cycle (4, 1) • All possible cycles-candidates (p, o) are generated(Сand). • The binary sequence is scanning ones and all non-periodic cycles are excluded from Cand: The residual cycles in Cand are periodic cycles. if BinSeq[i] = 0 for (p = P_min; p <= P_max; p++) Cand.delete(p, i mod p)

  25. Detection of periods with a given minimum support (1) • 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 cycle (4, 1) with support 80% • Base-line algorithm: for (p = P_min; p <= P_max; p++) for (o = 0; o < p; o++) s = CalculateSup(p,o); if (s >= Sup_min) Add cycle (p,o) to Result;

  26. Algorithm • Preparing data • Looking for Patterns • Finding Periods • Validation

  27. Validation • The goal: • Compare found groups with known business processes (departures) • Analyze: • Departures processed in other systems were not found at all • Some departures weren’t found • Some of groups weren’t associated with known processes • Different ports correlate with different groups • Possible reason: different cargo types

  28. Validation Summary | {departures : |tdep-tgr|<2}| Detected departures = | { departures } |

  29. Departures from Gdynia port

  30. Conclusions • Mining summary data is computationally feasible and can produce reasonably good precision • Topics for future work: • Convert business process patterns into data access patterns • Compare alternative mining approaches • Evaluate the techniques on other classes of data • Define a framework for adaptive self-tuning

More Related