1 / 7

Detecting Distance-Based Outliers in Streams of Data

Detecting Distance-Based Outliers in Streams of Data. Fabrizio Angiulli and Fabio Fassetti DEIS, Universit `a della Calabria. Problem Definition.

Download Presentation

Detecting Distance-Based Outliers in Streams of Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detecting Distance-Based Outliers in Streams of Data Fabrizio Angiulli and Fabio Fassetti DEIS, Universit `a della Calabria

  2. Problem Definition • Definition 3.1 (Distance-Based Outlier).Let S be a set of objects, obj an object of S, k a positive integer, and R a positive real number. Then, obj is a distance-based outlier (or, simply, an outlier) if less than k objectsin S lie within distance R from obj. • The neighbors of an object obj that precede obj in the stream and belong to the current window arecalled preceding neighbors of obj. • The neighbors of an object obj that follow obj in the stream and belongto the current window are called succeeding neighbors of obj.

  3. Problem Definition • If the number of succeeding neighbors of obj isless than k, obj could become an outlierdepending on thestream evolution. • Conversely, since obj will expire beforeits succeeding neighbors, inliers having at least k succeeding neighbors will be inliers for any stream evolution. Suchinliers are called safe inliers.

  4. Example

  5. Information of ISB • n.obj : a data stream object. • n.id: the identifier of n:obj, that is the arrival time ofn:obj. • n.count after : the number of succeeding neighbors of • n.obj. This field is exploited to recognize safe inliers. • n.nn_before: a list, having size at most k, containingthe identifiers of the most recent preceding neighborsof n.obj. At query time, this list is exploited to recognize the number of preceding neighbors of n.obj.

  6. Exact algorithm

  7. Approximate Algorithm

More Related