1 / 19

Text Mining: an Introduction

Text Mining: an Introduction. By:Alireza Vazifedoost Univeristy of Tehran Elec. & Computer Eng. Department a.vazifedoost@ece.ut.ac.ir. Agenda . Definitions Applications Text Mining Process Conclusion References. Introduction.

ruth-haley
Download Presentation

Text Mining: an Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Mining: an Introduction By:Alireza Vazifedoost Univeristy of Tehran Elec. & Computer Eng. Department a.vazifedoost@ece.ut.ac.ir

  2. Agenda • Definitions • Applications • Text Mining Process • Conclusion • References

  3. Introduction • Huge volume of Information : it’s difficult to find what really we have! • 80% of our Information is in unstructured of semi structured format. • Three main approaches: • Information Retrieval or Document Retrieval : vector space, LSI… • Information Extraction: such as filling a database from some emails Information. • Knowledge Discovery: Oops! can be described as the process of identifying novel information from a collection of texts

  4. Introduction (cont.) • Text Data mining=Text Mining= Knowledge discovery in Text (KDT) • Some Definitions: • Text Mining is the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. (Hearst) • A key element is the linking together of the extracted information together to form new facts or new hypotheses to be explored further by more conventional means of experimentation

  5. Introduction (cont.) • the process of discovering heretofore unknown information from a text source (Hearst) • looking for patterns in unstructured text (Nahm) • text mining applies the same analytical functions of data mining to the domain of textual information(Doore(

  6. Introduction (cont.) • Text Mining is different with Information Extraction • IE likes filling a database with already known information from unstructured texts. • There is no novelty involved

  7. Text Mining, a Conjunction! • Text mining is an inter-disciplinary field • using techniques from the fields of • information retrieval • natural language processing • machine learning • visualization • clustering • summarization

  8. Some Applications • News Mining • Feature Extraction. • Search and Retrieval • Categorization( Supervised classification) • Clustering Unsupervised Classification) • Summarization • Trends Analysis • Associations • Visualization

  9. Text Mining Process

  10. Text Mining Methodologies • Text Mining can be performed by a collection of methods from various technological areas. • can be roughly grouped under two main headings. • performance-based • knowledge-based

  11. Performance Based • designers are concerned with the effective behavior of the system and not necessarily with the means used to obtain that behavior. • Statistical Methods • Neural Network

  12. Performance Based: Association Rules Extraction • A={w1,w2,…,wn} : a set of keywords • T={t1,t2,…,tn}: each ti is associated with a subset of A, i.e. ti(A). • Let W c A be a set of key words, the set of all documents t in T such that W c t(A) will be called covering set for W and denoted [W]. • Any pair (W,w), where W c A is a set of keywords and w E A\W will be called association rule, and denoted by: W=>w

  13. Performance Based: Association Rules Extraction (cont.) • R : W=>w • S ( R,T)= |[W ∪ {w}]| is called Support of R . • C (R,T) = |[W ∪ {w}]| / |[W]| is called Confidence of R. • By Confidence we mean conditional probability of a text indexed with keywords w, if it is already indexed with keyword set W. • S ( R,T) > σ , C (R,T) >γ

  14. Performance Based: Association Rules Extraction (cont.)

  15. Knowledge-based systems • Knowledge-based systems on the other hand use explicit representations of knowledge. • meaning of words, relationships between facts, and rules • NLP based • Using patterns. • GATE: • POS, Geographical taging,… • Ontology based

  16. Some Text Mining tools

  17. Conclusions • There is a great need for transforming Information to knowledge. • Text Mining is relatively young. • NLP will have a great role in this field.

  18. References [1] M. Hearst, Untangling text data mining. [2] Ah-Hwee Tan, Text Mining: The state of the art and the challenges [3] Text analysis and understanding [4] Martin Rajman,TextMining knowledge extraction from unstructured textual data. [5] Aditya Kumar Sehgal,Text Mining: The Search for Novelty in Text

  19. THANK YOU Questions

More Related