1 / 31

Term Paper DETECTING OUTLIERS

Term Paper DETECTING OUTLIERS. Group – 5 Santhosh Kumar Kotagiri Allam Swetha Reddy Manideep Krishna Bhimavarapu. Outliers. Data that deviates from the normal data is called an outlier. An outlier can be: Any data that is inconsistent Rare data Deviant object

xylia
Download Presentation

Term Paper DETECTING OUTLIERS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Term PaperDETECTING OUTLIERS Group – 5 Santhosh Kumar Kotagiri Allam Swetha Reddy Manideep Krishna Bhimavarapu

  2. Outliers • Data that deviates from the normal data is called an outlier. • An outlier can be: • Any data that is inconsistent • Rare data • Deviant object • Exceptional transactions • Outlier Detection is a Data mining technique that detects outliers from given set of data

  3. Papers Selected • Outlier Detection for Transaction Databases using Association Rules • Spatio-Temporal Outlier Detection in Large Databases • Detecting Spatio-temporal Outliers in Climate Dataset: A Method Study

  4. Outlier Detection for Transaction Databases using Association Rules

  5. Preliminaries • Sup & Min_sup • Frequency Itemset (FI) & maximal FI • Association Rule • Confidence & min_conf • High – confidence rules • Unobserved Rule • Associative Closure • Outlier Degree • Outlier Transaction

  6. Basic Concept • Existing Model: Brute-Force algorithm • The paper presents two devices for faster detection of outliers • Remove redundant association rules • Prune candidates of transaction outliers utilizing maximal frequent item sets

  7. Outlier Candidate Detection • Maximal Associative Closure • Upper Bound of Outlier Degrees • A transaction t, if the upper bound of t’s outlier degree is less than a given minimal outlier degree, are not regarded as an outlier at any price.

  8. Pruning Redundant Rules • Nonredundant Rules • X → Y ∈ R is nonredundantrule when it has no other association rule Z→W ∈R and S→V ∈R such that (i) X∪Y = Z ∪W ∧X ⊃ Z and (ii) X = S ∧Y ⊂ V • Minimal rule set for R (Rmin) • |Rmin| ≤ |R|

  9. Full Algorithm

  10. Experiments and Results • Datasets • Intrusion • Synthetic • Accuracy measures

  11. Spatio-Temporal Outlier Detection in Large Databases

  12. Drawbacks with other detection algorithms. • What are ST-Outliers? • 3 step approach • Clustering • Checking Spatial neighbors • Checking Temporal neighbors

  13. Clustering: • DBSCAN algorithm • Modifications made: • To support temporal aspects • To find outliers from clusters with different densities. • Input Parameters: • Eps1 • Eps2 • MinPts • △E

  14. Algorithm

  15. Algorithm (cont.d)

  16. Formulae

  17. Checking Spatial and Temporal Outliers • An object is considered as an S-outlier if it is outside the interval [L,U] • L=A-K0σ and L=A+K0σ • σ = SQRT(V) • Dataset: • wave height values of four seas: the Black Sea, the Marmara Sea, the Aegean Sea, and the east of the Mediterranean Sea.

  18. Sensitivity Analysis of Parameters

  19. Detecting Spatio-temporal Outliers in Climate Dataset: A Method Study

  20. To detect useful and meaningful outliers in climate dataset, this paper introduces a formalized way to define outliers in Spatio-temporal data. • The definition of outlier needs to consider 3 aspects • Basic element • Compare element • The compare function

  21. Location outliers given a time period The basic element • We focus on the spatial location in this dataset, so the basic element is just location or grid. • < i, Li ,Ti > is represents the attributes with whole observations of temperature time series at this location. • element with the ID of i • Li stands for its location • Ti stands for its temperature time series.

  22. The compare element • Find the difference between the location and its neighbors. • The compare element is defined as some aggregation functions on the neighborhood.

  23. The Compare Function • If f (i) ≥θ , we classify location i as a location outlier in the given time period. θ is a parameter that can be adjusted.

  24. Time period outliers given as region • Location outlier can be extended to region outliers easily by only replacing the location in the basic element with region. • In other cases, we find the anomalous time period in a given area. • For instance, find the years that with too much precipitation. • This problem can be easily solved using simple statistics. • In a certain region, flood can’t be detected by only considering the average precipitation of the year.

  25. As illustrated in Figure 2, although the average precipitation in 1994 and 2002 are larger than the year 1998, flood happened only in 1998.

  26. Basic Element • This time we consider time period as basic element. • the basic element as < i ,Ti, STDistri i> • i means the id number • Ti means time period • STDistriimeans the spatio-temporal distribution of this time period.

  27. Compare element • In general We compare each time period with every other time periods. • since we generally don’t just compare a certain year with its fore-and-aft years, but compare it with most of the other years. • It defined as some aggregation functions on all the time periods.

  28. The Compare Function • Dimension of STDistrii extremely large (687 locations×12 months in our case), it is really hard to handle. • A simple method to solve the problem is just dividing the area into several regions, such as 8×3, and dividing time into 4 seasons.

  29. Conclusion • An algorithm for detection of outliers in Transactional data, unlike numerical data is modelled using outlier Degree. • Outliers are detected in Spatio-Temporal data using a clustering method. • Outliers are detected using “basic element” and extending it to Spatio-Temporal data.

  30. You’re Welcome to ask Questions !

More Related