An Automatic Text Mining Framework for Knowledge Discovery on the Web. Wingyan Chung The University of Arizona March 30, 2004. Acknowledgments. NSF and NIJ Grants Dr. Hsinchun Chen, Dr. Jay F. Nunamaker , Dr. J. Leon Zhao, Dr. Richard T. Snodgrass, Dr. D. Terence Langendoen, Dr. Olivia Sheng
An Automatic Text Mining Framework for Knowledge Discovery on the Web
The University of Arizona
March 30, 2004
To effectively and efficientlydiscover knowledge (business intelligence) from vast amount of textual information on the Web
Convenient storage hasmade information exploration difficult
Information is unreliable
Heterogeneity and unmonitored qualityof information on the Web
Hard to know all stakeholders
Interconnected nature of the Web complicates understanding of relationships
How can we develop an automatic text mining approach to address the problems of knowledge discovery on the Web?
How effective and efficient does such an approach assist human beings in discovering knowledge on the Web?
What lessons can be learned from applying such an approach in the context of human-computer interaction (HCI)?
Knowledge and Knowledge Management
Text Mining for Web Analysis
Research Formulation and Approach
Empirical Studies on Business Intelligence Applications
Using Web Page Classification Techniques to Automate Business Stakeholder Analysis
P = Partners/suppliers, E = Employees/Unions, C = Customers,
S = Shareholders/investors, U = Education/research institutions, M=Media/Portals,
G = Public/government, R = Recruiters, V = Reviewers, O = Competitors,
T = Trade associations, F = Financial institutions, I = Political groups,
N = SIG/Communities
Ordered by their relevance to stakeholder types appearing on the Web
Link to the host company (ClearForest)
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>David Schatsky: Search and Discovery in the Post-Cold War Era</title> ...
<p>I just saw a demo by <a href = "http://www.clearforest.com"> ClearForest, </a> a company that provides tools for analyzing unstructured textual information. It's truly amazing, and truly the search tool for the post-Cold War era. ... </p> ...
HTML hyperlink and extended anchor text
Efficiency: time used (in minutes)
User subjective ratings and comments
Business stakeholders of Siebel
Definitions of business stakeholders
Conclusions, Limitations and Future Directions
Using Web Page Classification for Business Stakeholder Analysis
Building a BI Search Portal
Applying Web Page Visualization to Exploring BI
Enhance knowledge discovery on the Web
Better understanding in HCI