Visualizing association rules for text mining
Download
1 / 16

Visualizing Association Rules for Text Mining - PowerPoint PPT Presentation


  • 197 Views
  • Updated On :

Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory Introduction An association rule in data mining is an implication of the form X -> Y where X is a set of antecedent items and Y is the consequent item.

Related searches for Visualizing Association Rules for Text Mining

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Visualizing Association Rules for Text Mining' - omer


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Visualizing association rules for text mining l.jpg

Visualizing Association Rules for Text Mining

- Sangjik Lee

Pak Chung Wong, Paul Whitney, Jim Thomas

Pacific Northwest National Laboratory


Introduction l.jpg
Introduction

  • An association rule in data mining is an implication of the form X -> Y where X is a set of antecedent items and Y is the consequent item.

  • For years researchers have developed many tools to visualize association rules.

  • However, few of these tools can handle more than dozens of rules, and none of them can effectively manage rules with multiple antece-dents.

  • Thus, it is extremely difficult to visualize and understand the association information of a large data set even when all the rules are available.


Association l.jpg
Association

  • Powerful data analysis technique that appears frequently in data mining literature.

  • An example association rule of a supermarket database is 80% of the people who buy diapers and baby power also buy baby oil.


Slide4 l.jpg

  • The system was developed to support text mining and visualization research on large unstructured document corpora.

  • The focus is to study the relationships and implications among topics, or descriptive concepts, that are used to characterize a corpus.

  • The goal is to discover important association rules within a corpus such that the presence of a set of topics in an article implies the presence of another topic.


Slide5 l.jpg


Current technology l.jpg
Current Technology the words “Greenspan” and “inflation” occur, it is highly probably that the stock market is also mentioned.

  • Two-Dimensional Matrix


Current technology7 l.jpg
Current Technology the words “Greenspan” and “inflation” occur, it is highly probably that the stock market is also mentioned.

  • Directed Graph


Current technology8 l.jpg
Current Technology the words “Greenspan” and “inflation” occur, it is highly probably that the stock market is also mentioned.

  • Directed Graph

  • This technique works well when only a few items(nodes) and associations(edges) are involved.

  • An association graph can quickly turn into a tangled display with as few as a dozen rules.


A novel visualization technique l.jpg
A Novel Visualization Technique the words “Greenspan” and “inflation” occur, it is highly probably that the stock market is also mentioned.

  • To visualize many-to-one association rules

  • Instead of using the tiles of a 2D matrix to show the item-to-item association rules, used the matrix to depict the rule-to-item relationship.


A visualization of item associations with support 0 4 and confidence 50 l.jpg
A visualization of item associations with the words “Greenspan” and “inflation” occur, it is highly probably that the stock market is also mentioned.support >= 0.4% and confidence >= 50%


A novel visualization technique continued l.jpg
A Novel Visualization Technique ( Continued ) the words “Greenspan” and “inflation” occur, it is highly probably that the stock market is also mentioned.

  • the rows of the matrix floor represent the items (or topics in the context of text mining)

  • the columns represent the item associations.

  • The blue and red blocks of each column (rule) represent the antecedent and the consequent of the rule. The identities of the items are shown along the right side of the matrix.

  • The confidence and support levels of the rules are given by the corresponding bar charts in different scales at the far end of the matrix.


A novel visualization technique advantage l.jpg
A Novel Visualization Technique the words “Greenspan” and “inflation” occur, it is highly probably that the stock market is also mentioned.- Advantage

  • There is virtually no upper limit on the number of items in an antecedent.

  • We can analyze the distributions of the association rules (horizontal axis) as well as the items within (vertical axis) simultaneously.

  • the identity of individual items within an antecedent group is clearly shown.

  • Because all the metadata are plotted at the far end and the height of the columns are scaled so that the front columns do not block the rear ones, few occlusions occur.


Conclusion and future work l.jpg
Conclusion and future work the words “Greenspan” and “inflation” occur, it is highly probably that the stock market is also mentioned.

  • Applied the new technique to a text mining system to analyze a large text corpus.

  • The results indicate that our design can easily handle hundreds of multiple antecedent association rules in a 3D display.

  • Long-term goal is to integrate many of tools and techniques into a single visualization environment that provides time sequence analysis, hypothesis explanation and document summarization.


References l.jpg
References the words “Greenspan” and “inflation” occur, it is highly probably that the stock market is also mentioned.

  • Pak Chung Wong, Paul Whitney, and Jim Thomas. Visualizing Association Rules for Text Mining. In Graham Wills and Daniel Keim, editors, Proceedings of IEEE Information Visualization '99, Los Alamitos, CA, 1999. IEEE CS Press

  • Pak Chung Wong, Wendy Cowley, Harlan Foote, Elizabeth Jurrus, and Jim Thomas. Visualizing Sequential Patterns for Text Mining. Proceedings IEEE Information Visualization 2000, Salt Lake City, Utah, Oct 8 - Oct 13, 2000.

  • Nancy E. Miller, PakChungWong, Mary Brewster, and Harlan Foote. TOPIC ISLANDS - A Wavelet-Based Text Visualization System. In David Ebert, Hans Hagan, and Holly Rushmeier, editors, Proceedings IEEE Visualization '98, pages 189 -- 196, New York, NY, Oct 18 -- 23, 1998. ACM Press.


ad