statistische methoden in der computerlinguistik statistical methods in computational linguistics l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics PowerPoint Presentation
Download Presentation
Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics

Loading in 2 Seconds...

play fullscreen
1 / 8

Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics - PowerPoint PPT Presentation


  • 247 Views
  • Uploaded on

Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics 2a. Course projects Jonas Kuhn Universität Potsdam, 2007 Leistungen im Kurs Übungsaufgaben (werden nicht benotet) 2-3 größere Programmieraufgaben (Abgabe; werden bewertet)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics' - emily


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
statistische methoden in der computerlinguistik statistical methods in computational linguistics

Statistische Methoden in der ComputerlinguistikStatistical Methods in Computational Linguistics

2a. Course projects

Jonas Kuhn

Universität Potsdam, 2007

leistungen im kurs
Leistungen im Kurs
  • Übungsaufgaben (werden nicht benotet)
  • 2-3 größere Programmieraufgaben (Abgabe; werden bewertet)
  • Teilnahme in einem “Projekt-Team” (à 2-5 Mitglieder)
    • Bezug zu einem Gesamt-Kursprojekt (s.u.)
    • Recherchen zu einem Teil-Thema (zu Literatur und/ oder verfügbaren Werkzeugen/Ressourcen)
    • (Kurz-)Referat zu Ergebnissen / evtl. kleines Tutorium zu Techniken von allgemeinem Interesse
    • Experimente mit Werkzeugen bzw. Programmierung
    • Dokumentation der Projektarbeit (nach Teilnehmern aufgeschlüsselt
the spock challenge
The Spock Challenge
  • The Entity Resolution Problem
    • A common problem that we face is that there are many people with the same name. Given that, how do we distinguish a document about Michael Jackson the singer from Michael Jackson the football player?
    • World-wide contest for a software solution
    • http://challenge.spock.com/
      • Winning team receives $ 50,000 prize
      • (NOTE RULES! “Upon acceptance of the prize, the winning Software Submissions and all source code and algorithms related thereto becomes the sole and exclusive property of Spock.”)
the spock challenge4
The Spock Challenge
  • With billions of documents and people on the web, we need to identify and cluster web documents accurately to the people they are related to.
  • Mapping these named entities from documents to the correct person is the essence of the Spock Challenge.
the spock challenge5
The Spock Challenge
  • Data set
    • The complete data-set is divided into training and test sets containing roughly 25,000 and 75,000 documents, respectively.
    • Along with a set of documents we've included a set of target names. You can assume that each document contains only one of the target names (even though most documents contain many names).
    • The challenge is to partition all the documents relevant to a target name by their referent. Consider the following two documents with the target name "Michael Jackson": Michael Jackson - The King of Pop or Wacko Jacko? Michael Jackson statistics - pro-football-reference.com The referents of these articles are the pop star and football player, respectively. We've included the ground truth for the training set so you have something to compare against.
the spock challenge6
The Spock Challenge
  • Test/Application
    • Once you're done training, you can run your algorithm on the test set and submit your results on this site. (http://challenge.spock.com/)
    • We will provide instant feedback in the form of a percentage rank score (using the F-measure). This way you can see how you stack up against the other teams. What good is a problem without a little competition?
course projects inspired by spock challenge
Course projects inspired by Spock challenge
  • Experiment with various (mostly statistical) NLP techniques on the data set
  • Any Ideas?
sub tasks we need a team for each
Sub-tasks (we need a team for each)
  • State of the Art in Entity Resolution (a.k.a. deduplication, or merge-purge)
  • Clustering
    • Starting point: Manning/Schütze 1999, ch. 14
  • Information/Document Retrieval (?)
    • Starting point: Manning/Schütze 1999, ch. 15
    • Term weighting techniques
    • Possibly build additional data sets
  • Named Entity Detection
  • Coreference Resolution
  • Parsing, Semantic Role Labelling
  • Using Word-Net (and other ontological resources)
  • Using Wikipedia (and other encyclopaedic resources)
  • Word Sense Disambiguation (possibly similar techniques)
  • Software Integration, Testing