1 / 33

PTAG: Large Scale Automatic Generation of Personalized Annotation TAGs for the Web

PTAG: Large Scale Automatic Generation of Personalized Annotation TAGs for the Web. PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND

kaethe
Download Presentation

PTAG: Large Scale Automatic Generation of Personalized Annotation TAGs for the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PTAG: Large Scale Automatic Generation of Personalized Annotation TAGs for the Web PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2007 SESSION: SEMANTIC WEB AND WEB 2.0

  2. Outline • Abstract • Introduction • Previous Work • Automatic Personalized Web Annotations • Experimental Results • Conclusions • Future Work • Comments

  3. Abstract • The success of the Semantic Web depends on the availability of Web pages annotated with metadata • In this paper they propose P-TAG, a method which automatically generates personalized tags for Web pages • produces keywords relevant to its textual content • also to the data residing on the surfer’s Desktop • Empirical evaluations with several algorithms pursuing this approach showed very promising results

  4. Introduction (1/3) • The Semantic Web a vision of a future Web of machine-understandable documents and data • Annotations are the main instrument, which enrich content with metadata in order to ease its automatic processing • The problem of traditional manual or semi-automatic annotation • Alternative method: tagging

  5. Introduction (2/3) • Why automatic tagging? • Webpage are growth very fast • Recommendation • Why personalization? • Automatically generated tags have the drawback of presenting only a generic view

  6. Introduction (3/3) • Problems of user profile • These profiles are laborious to create and need constant maintenance in order to reflect the changing interest of the user • Personal Desktop usually contains a very rich document corpus of personal information • Can and should be exploited for user personalization

  7. Previous work (1/2) - Generating annotations for web • Brooks and Montanez [4] • analyzed the effectiveness of tags for classifying blog entries • and found that manual tags are less effective content descriptors than automated ones • Cimiano et.al. [10, 11] • Proposed PANKOW (Pattern-based Annotation through Knowledge on the Web) • Employs an unsupervised, pattern-oriented approach to categorize an instance with respect to a given ontology • C-PANKOW: enhanced version of PANKOW • It requires an input ontology and output instances of the ontological concepts • Annotation is always directly rooted on the text of the web page

  8. Previous work (2/2) - Generating annotations for web (cont’d) • Dill et. al. [14] • Present a platform for large-scale text analytics and automatic semantic tagging • The system spots knows terms in a webpage and relates it to existing instances of a given ontology - Text Mining for Keywords Extraction - Text Mining for Keywords Association

  9. Automatic personalized web annotations (1/4) • Three approaches to generate personalized web page annotations • Document Oriented Extraction • Keyword Oriented Extraction • Hybrid Extraction

  10. Automatic personalized web annotations (2/4) • Document Oriented Extraction

  11. Automatic personalized web annotations (3/4) • Keyword Oriented Extraction

  12. Automatic personalized web annotations (4/4) • Hybrid Extraction

  13. Experimental • Experimental Setup • Documents set of personal desktop • E-mails、Web cache documents、all files (user selected paths) • For the annotation, the input web page were categorized • Small (below 4KB) • Medium (between 4KB and 32KB) • Large (more than 32KB) • Total of 96 web pages were used as input to be annotated • Over 2000 resulted annotations • Each proposed keyword was rated 0 (not relevant) or 1 (relevant) • Measured the quality of the produced annotations using precision • The precision at level K (P@K)

  14. Experimental Results (1/5) • Document Oriented Extraction Small web pages Medium web pages Large web pages

  15. Experimental Results (2/5) • Keyword Oriented Extraction Large web pages Medium web pages Small web pages

  16. Experimental Results (3/5) • Hybrid Oriented Extraction Medium web pages Small web pages Large web pages

  17. Experimental Results (4/5) • Precision at the first three output annotations for the best methods of each category

  18. Experimental Results (5/5) • Examples of annotations

  19. Applications • Personalized Web Search • Web Recommendations for Desktop Tasks • Ontology Learning

  20. Conclusions • Our technique overcomes the burden of manual tagging • The system does not require any manual definition of interest profiles • The system proposes a more diverse range of tags which are closer to the personal viewpoint of the user • The results produced provide a high user satisfaction

  21. Future Work • A shared server approach that supports social tagging • Diversity • Keywords are generated from millions of sources • Scalability • High utility for web search, analytics and advertising • Instant update

  22. Comments • In regard to the automatic tags generation, the existing tools are good enough to implement the system • Tag recommendation is a good incentive for user to give tags • Automatic tagging are aids, for the social network on the web, user’s tags represented a comprehension of “what the people is”

  23. Finding Similar Documents • Cosine Similarity • Based on TFxIDF • The weight of terms calculated from Vectors of two documents Weights of term t for two documents For all terms of two documents

  24. Extracting Keywords from Documents • Keyword extraction algorithms usually take a text document as input and then return a list of keywords • Each keyword has associated a value representing the confidence

  25. Extracting Keywords from Documents • For keyword extraction, they use the following methods Term Frequency Document Frequency Lexical Compounds Sentence Selection

  26. Term Frequency • This is necessary especially for longer documents, because more informative terms tend to appear towards beginning Position of the first appearance of the term Number of terms in the document

  27. Lexical Compounds • Noun analysis is the simplest approach for lexical compound • Step1: part-of-speech tagging for the document • Step2: finding the pattern of { adjective? , noun+ } • Step3: ordering the patterns by frequency Zero or one One or more

  28. Sentence Selection • This technique builds upon sentence oriented document summarization • Ranking the document sentences according to their salience score [26] Number of significant words in the sentence Number of query terms present in a sentence * Significant word Optional parameter Number of terms in a query Position score Total number of words in the sentence

  29. Sentence Selection • Significant word Number of sentences in the document

  30. Finding of Similar Keyword • For find related keywords, they use the following methods Term Co-occurrence Statistics Thesaurus Based Extraction

  31. Term Co-occurrence Statistics Extracted keywords from web page

  32. Similarity Coefficients • Cosine similarity • Mutual Information • Likelihood Ratio

  33. Thesaurus Based Extraction

More Related