1 / 27

An Architecture for Emergent Semantics

An Architecture for Emergent Semantics. Sven Herschel, Ralf Heese , and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut. Ideas of Emergent Semantics. Improve document representation by aggregating many users’ opinions

fola
Download Presentation

An Architecture for Emergent Semantics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/Hasso-Plattner-Institut

  2. Ideas of Emergent Semantics • Improve document representation • by aggregating many users’ opinions • Adding keywords implicitly whilequerying the corpus • Living document representationinstead of query reformulation • Entirely new keywords • Immediate change of thedocument representation andof the corpus index User query IR Query Engine corpus/ doc repr. Information Retrievaltoday S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  3. Outline • Basement (Background) • Construction (Architecture of Emergent Semantics) • Assessment (Evaluation) • Roof and Windows (Conclusion and Future Work) S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  4. Basement (Background)

  5. Information Retrieval • Information Retrieval • Content-oriented search on a set of documents • Find an document representation to retrieve documents effectively and efficiently according to the user’s query • Today's approaches • Capture the semantics of a document by analyzing syntactic information • No new words in document representation • Synonyms cannot be added • Query refinement Basement Construction Assessment Roof and Windows S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  6. current IR approaches emergent semantics Semiotic signs  signs signs  represented object signs  user interpretation Basement Construction Assessment Roof and Windows Syntax t r e e Semantics A tall perennial woody plant … Pragmatics A figure that branchesfrom a single root … http://www.wordreference.com/definition/tree S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  7. Construction(Architecture of Emergent Semantics)

  8. t1 tn t2 tn ! ? Components of Emergent Semantics corpus/doc repr. know- legde Query Engine Interpreter Basement Construction Assessment Roof and Windows 1 2 Retrieval Engine Ranking Function 3 AnnotationFilter 4 Quality Measure S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  9. t1 tn t2 tn Bootstrapping corpus/doc repr. know- legde Basement Construction Assessment Roof and Windows Index the document corpus,e.g., TF/IDF, Latent Semantic Indexing S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  10. t1 tn t2 tn ? Receiving a Query corpus/doc repr. know- legde Interpreter Basement Construction Assessment Roof and Windows 1 Reformulate the query,e.g., query expansion, replacing terms S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  11. t1 tn t2 tn ? Query Evaluation corpus/doc repr. know- legde Query Engine Interpreter Groundwork Construction Assessment Roof and Windows 1 2 Retrieval Engine Ranking Function Select documents according to the query,e.g., inverted index of all terms Rank the list of matching documents,e.g., vector space model S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  12. t1 tn t2 tn ! ? Query Result corpus/doc repr. know- legde Query Engine Interpreter Basement Construction Assessment Roof and Windows 1 2 Retrieval Engine Ranking Function 3 The user determines the set of relevant documentsby evaluating the document surrogates. S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  13. t1 tn t2 tn ! ? Feedback corpus/doc repr. know- legde Query Engine Interpreter Basement Construction Assessment Roof and Windows 1 2 Retrieval Engine Ranking Function 3 AnnotationFilter Idea: Document is found by query terms and Document is marked as relevant  All query terms are related to the document 4 Quality Measure The user retrieves the relevant documents. Add the original query to the document representation S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  14. t1 tn t2 tn ! ? Emergent Semantics Architecture corpus/doc repr. know- legde Query Engine Interpreter Basement Construction Assessment Roof and Windows 1 2 Ranking Function Retrieval Engine 3 AnnotationFilter 4 Quality Measure Syntax Pragmatics Semantics What do I mean by my query? How do most users formulate this query? How is the corpus queried? S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  15. ? ! Example – Querying the document corpus • TF/IDF matrix of the document corpus • RDBMS does not occur in the document corpus • QueryQ = {RDBMS, SQL, language} • Ranked resultDQuery = (d1, d5, d2, d10)Drelevant = {d1, d2} Basement Construction Assessment Roof and Windows doc repr. Query Engine TF/IDF: weight = (term freq ∙ #doc) / doc freq S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  16. Example – Adding the query terms • Adding {RDBMS, SQL, language} to document representation • Recalculation of the TF/IDF matrix necessary Basement Construction Assessment Roof and Windows AnnotationFilter Recalculation for keyword: language Recalculation for keyword: SQL Recalculation for keyword: RDBMS S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  17. Living Document Representation • Document representations change over time (living document representation) • Many similar queries  weights of the query terms increase • Unrelated query terms  document representation changes only slightly • New keywords / semantic concepts in document representation Basement Construction Assessment Roof and Windows Documentrepresentations Query S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  18. Assessment(Evaluation)

  19. Experiment I - Setup • CACM corpus • 3200 documents + 32 queries + gold standard • Title and abstract tokenized and indexed using Apache Lucene • Retrieval and Ranking • Vector space model with TF/IDF weights • Feedback • Attach the tokenized query to all relevant document representations Basement Construction Assessment Roof and Windows S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  20. Exploit corpus correlations • Split the set of queries into halves • Run first half and feed back all query terms • Run second half Basement Construction Assessment Roof and Windows Run query set 1 Identical to TF/IDF without EmSem Small overlap between queries Small overlap between result sets Add query terms to relevant document representation Run query set 2 Run query set 1 Measure again (1st EmSem run)) Run query set 2 Add query terms again … S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  21. Feeding back all query terms • Run all queries and feed back all query terms Groundwork Construction Assessment Roof and Windows S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  22. Experiment II - Setup • First phase • Presented a wide variety of images to users • Which keywords would you use to find the image with a search engine? • Second phase • Rate the adequacy of the annotations Basement Construction Assessment Roof and Windows S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  23. Results Weihnachtsmann 26.5% Brille 7.8% Nikolaus 7.8% Weihnachten 6.5% Santa Claus 6.0% Phase 1 Groundwork Construction Assessment Roof and Windows % Users terms Phase 2 Weihnachtsmann 100.0% Brille 51.8% Nikolaus 91.6% Weihnachten 61.5% Santa Claus 75.0% % users S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  24. Conclusions from our Experiments • Document representations become more precise over time. • A small number of terms describe an image sufficiently. • A large number of user queries can be satisfied by indexing a small number of terms. Basement Construction Assessment Roof and Windows S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  25. Roof and Windows(Conclusion)

  26. Roof and Windows • Architecture for emergent semantics • Users’ individual pragmatics aggregated into representation of documents • Living document representation Outlook • Applying EmSem to distributed IR • Reducing the size of document representations • Less network traffic Basement Construction Assessment Roof and Windows S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics

  27. Thank you!

More Related