1 / 55

Updating Methods and Relations in DOE Research Students

This paper discusses the updating methods and relations among concepts in DOE research students, focusing on statistical methods, user feedback, and automation.

ravery
Download Presentation

Updating Methods and Relations in DOE Research Students

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Updating methods and relations among conceptsin DOE Research Students:Chakravarthi S VelvadapuGovind R MaddiRatnakar R KrishnamaFaculty Advisors:Dr.James Gil De LamadridDr.Sadanand SrivastavaCADIP’02 Conference Sponsored byUS Department Of Defense

  2. OVERVIEW • The system takes text documents as its input • Performs semantic analysis on these documents • Generates useful ontology • Represents it graphically

  3. GOAL OF THE PROJECT To build an Ontology utilizing • Statistical methods • A small amount of user feedback • Automation

  4. Architecture of DOE Text Document Pre-processing Normalization Latent Semantic Indexing (SVD) Document Ontology Graph Construction GUI Updating Methods

  5. Pre-processing • Read-in text file • Extract meaningful terms • Count their frequencies

  6. Normalization • Calculate weight of each term using • W i,k = frequency i,knk • Σfrequency j,k • Calculate weight of each term using W i,k = frequency i,knk Σfrequency j,k j=1

  7. Normalization(contd) • Calculate normalized weight using W i,k w(i,k) nk sqrt(Σ w2(j,k)) j=1

  8. Build Term-Doc Matrix • Rows of Term-Doc matrix contains weights of each term in different concepts • Columns of Term-Doc matrix contains weights of different terms in each concept

  9. Latent Semantic Indexing(LSI) • Statistical method representing documents by statistically independent concepts • Based on Singular Value Decomposition (SVD),technique that decomposes a given matrix A into three components – U, S and V.

  10. SVD • A is formed from LSI as follows: A = US * SS * VsT US - derived from U removing all but the s columns SS - derived from S removing all but the largest s singular values VsT - derived from VT removing all but the s corresponding rows

  11. SVD (contd) US SS VsT A m x n U m x n S n x n VT n x n

  12. Document Ontology • Build Concept Nodes and Term Nodes using columns and rows of the term matrix (U).

  13. Graph Construction • A bipartite graph is constructed with concept nodes and term nodes • A concept node is connected to all term nodes that belong to it. • A term node is connected to all concept nodes to which it belongs.

  14. Graph Construction (contd) Term 1 Concept 1 Term 2 Term 3 Term 4 Concept 2 Term 5

  15. Graphical User Interface (GUI)

  16. GUI (contd) • GUI consists of • Concepts list • Terms list • Display for bipartite graph • Display for relations among concepts • Display for list of files in ontology

  17. GUI • To view terms related to a concept, user selects that concept from concepts list • To view concepts related to a term, user selects that term from terms list

  18. New Open Save saveAs Close Exit GUI – File Operations

  19. GUI – Ontology Updates • Add • Delete • ChangeSVDThreshold • changeConcThreshold • ChangeDuplicateThreshold • foldInDoc • SVDUpdate • defaultBuild

  20. GUI – Ontology Modifications • Rename • Renames a selected concept • DelTerm • Deletes a selected term • Undo • Ignores last modification and returns to the previous state

  21. Updating Ontology

  22. Adding new documents • Investigated less expensive methods for adding new documents: • Fold-In • SVD update

  23. Fold-In • A method to add new document(s) to an existing ontology • Uses the existing data in document addition process • Less expensive process than the regular build method

  24. Fold-In(contd) • Two step method • Step1 • Fold-In document vector • Compute new document vector(V) using d^ = dT * Uk * Sk-1 where d is document vector to be added • Append d^ to the columns of Vk

  25. Folding-In document vector Uk Sk k x k VkT k x (n+p) Ak m x (n+p) Uk m x k

  26. Fold-In (contd) • Step 2 • Fold-In term vector • Compute new term vector(U) using t^ = t * Vk * Sk-1 where t is term vector to be added • Append t^ to the rows of Uk

  27. Folding-In term vector VkT k x n Ak (m+q) x n Uk (m+q)x Sk k x k

  28. Fold-In (contd) • Using new document vector ( Vk ) and new term vector ( Uk ) • Rebuild concept nodes and term nodes • Reconstruct bipartite graph • Update GUI

  29. SVD Update • A method to add new document(s) to an existing ontology • Uses the existing data in document addition process • Less expensive process than the regular build method

  30. SVD Update (contd) • Three step method. • Step 1: • SVD Updating Documents • Let D = [ Ak / Dp ] where Ak is original term-document matrix and Dp is new document vector to be added. • SVD(D) = UD x D x VTD

  31. SVD Update (contd) • SVD of D can also be computed as UD = Uk x UUD and VD = Vk 0 x VUD 0 Ip where UD = [ k / UTkx Dp ].

  32. SVD Update (contd) • Step 2: • SVD Updating Terms • Let T = [ Tk / Tq ] where Ak is original term-document matrix and Tq is new term vector to be added. • SVD(T) = UTxTx VTT

  33. SVD Update (contd) • SVD(T) can also be computed as UT = Uk 0 x UUT 0 Iq and VT = Vk x VUT where UT = [ k /Tqx Vk ]

  34. SVD Update (contd) • Step 3: • Correction of term weights • Let W = Ak + Xix YiT where Xi is a m x i matrix comprised of rows of zeros or rows of the i-th order identity matrix, Ii. Yi is a n x i matrix representing the differences between old and new weights for each of the i terms. • SVD(W) = UW xWx VTW

  35. SVD Update (contd) • SVD(w) can also be computed as UW = Uk x UQ and VW = Vk x VQ where Q = [k + UTk x Xi x YiT x Vk ].

  36. SVD Update (contd) • Using new document vector ( Vw ) and new term vector ( Uw ) • Rebuild concept nodes and term nodes • Reconstruct bipartite graph • Update GUI

  37. Time Complexity • Time complexities for different update methods in the descending order • Recomposing SVD(default build) • SVD Update • Fold-In

  38. Relations among concepts

  39. Relations among concepts • Significance of V : • Rows of V represent documents • Columns of V represent concepts Concept vector (V)

  40. Types of relations • Sub concepts • Sub-super concepts • Disjoint concepts • Overlapping concepts • Parallel concepts • Parallel concepts • Antagonistic concepts

  41. Sub concepts If % of overlap is < threshold value – Disjoint > 100-threshold value – Sub-super other wise - overlapping

More Related