Semantics based news recommendation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

Semantics-Based News Recommendation PowerPoint PPT Presentation


  • 136 Views
  • Uploaded on
  • Presentation posted in: General

Semantics-Based News Recommendation. Introduction (1). Recommender systems help users to plough through a massive and increasing amount of information Recommender systems: Content-based Collaborative filtering Hybrid Content-based systems are often term-based

Download Presentation

Semantics-Based News Recommendation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Semantics based news recommendation

Semantics-Based News Recommendation

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Introduction 1

Introduction (1)

  • Recommender systems help users to plough through a massive and increasing amount of information

  • Recommender systems:

    • Content-based

    • Collaborative filtering

    • Hybrid

  • Content-based systems are often term-based

  • Common measure: Term Frequency – Inverse Document Frequency (TF-IDF) as proposed by Salton and Buckley [1988]

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Introduction 2

Introduction (2)

  • One could take into account semantics:

    • Semantic Similarity (SS) recommenders:

      • Jiang & Conrath [1997]

      • Leacock & Chodorow [1998]

      • Lin [1998]

      • Resnik [1995]

      • Wu & Palmer [1994]

    • Concepts instead of terms → Concept Frequency – Inverse Document Frequency (CF-IDF):

      • Reduces noise caused by non-meaningful terms

      • Yields less terms to evaluate

      • Allows for semantic features, e.g., synonyms

      • Relies on a domain ontology

      • Published at WIMS 2011

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Introduction 3

Introduction (3)

  • One could take into account semantics:

    • Synsets instead of concepts → Synset Frequency – Inverse Document Frequency (SF-IDF):

      • Similar to CF-IDF

      • Does not rely on a domain ontology

  • Implementations in Ceryx (as a plug-in for Hermes[Frasincar et al., 2009], a news processing framework)

  • What is the performance of semantic recommenders?

    • TF-IDF vs. SF-IDF

    • TF-IDF vs. SS

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Framework user profile

Framework: User Profile

  • User profile consists of all read news items

  • Implicit preference for specific topics

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Framework preprocessing

Framework: Preprocessing

  • Before recommendations can be made, each news item is parsed:

    • Tokenizer

    • Sentence splitter

    • Lemmatizer

    • Part-of-Speech

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Framework synsets

Framework: Synsets

  • We make use of the WordNet dictionary and WSD

  • Each word has a set of senses and each sense has a set of semantically equivalent synonyms (synsets):

    • Turkey:

      • turkey, Meleagris gallopavo (animal)

      • Turkey, Republic of Turkey (country)

      • joker, turkey (annoying person)

      • turkey, bomb, dud (failure)

    • Fly:

      • fly, aviate, pilot (operate airplane)

      • flee, fly, take flight (run away)

  • Synsets are linked using semantic pointers

    • Hypernym, hyponym, …

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Framework tf idf

Framework: TF-IDF

  • Term Frequency: the occurrence of a term ti in a document dj, i.e.,

  • Inverse Document Frequency: the occurrence of a term ti in a set of documents D, i.e.,

  • And hence

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Framework sf idf

Framework: SF-IDF

  • Synset Frequency: the occurrence of a synset si in a document dj, i.e.,

  • Inverse Document Frequency: the occurrence of a synset si in a set of documents D, i.e.,

  • And hence

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Framework ss 1

Framework: SS (1)

  • TF-IDF and SF-IDF use cosine similarity:

    • Two vectors:

      • User profile items scores

      • News message items scores

    • Measures the cosine of the angle between the vectors

  • Semantic Similarity (SS):

    • Two vectors:

      • User profile synsets

      • News message synsets

    • Jiang & Conrath [1997],Resnik [1995] , and Lin [1998]: information content of synsets

    • Leacock & Chodorow [1998] and Wu & Palmer [1994]:path length between synsets

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Framework ss 2

Framework: SS (2)

  • SS score is calculated by computing the pair-wise similarities between synsets in the unread document u and the user profile r:where W is a vector with all combinations of synsets from r and u that have a common Part-of-Speech, and where sim(u,r) is any of the mentioned SS measures.

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Implementation hermes

Implementation: Hermes

  • Hermes framework is utilized for building a news personalization service for RSS

  • Its implementation is the Hermes News Portal (HNP):

    • Programmed in Java

    • Uses OWL / SPARQL / Jena / GATE / WordNet

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Implementation ceryx

Implementation: Ceryx

  • Ceryx is a plug-in for HNP

  • Uses WordNet / Stanford POS Tagger / JAWS lemmatizer / Lesk WSD

  • Main focus is on recommendation support

  • User profiles are constructed

  • Computes TF-IDF, SF-IDF, and SS

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Evaluation 1

Evaluation (1)

  • Experiment:

    • We let 19 participants evaluate 100 news items

    • User profile: all articles that are related to Microsoft, its products, and its competitors

    • Ceryx computes TF-IDF, SF-IDF, and SS with cut-off of 0.5

    • Measurements:

      • Accuracy

      • Precision

      • Recall

      • Specificity

      • F1-measure

      • t-tests for determining significance

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Evaluation 2

Evaluation (2)

  • Results:

    • SF-IDF significantly outperforms TF-IDF

    • Almost all SS methods significantly outperform TF-IDF

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Conclusions

Conclusions

  • Common recommendation is performed using TF-IDF

  • Semantics could be considered by considering synsets:

    • SF-IDF

    • SS

  • Semantics-based recommendation outperforms the classic term-based recommendation

  • Future work:

    • Employ also the similarity of words (e.g., named entities) missing from WordNet (e.g., based on the Google Distance)

    • Compare CF-IDF, SF-IDF, and SS with LDA (latent dirichlet allocation) and ESA (explicit semantic analysis)

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


Questions

Questions

International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012)


  • Login