Reaction workshop 2011 01 06 task 1 progress report plans lisbon pt and austin tx
Download
1 / 14

REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon , PT and Austin , TX - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon , PT and Austin , TX. Mário J. Silva University of Lisbon , Portugal. Grants (paid by Reaction). Sílvio Moreira (BI: Oct 1, 2010 – March 31, 2011 ) João Ramalho (BIC: Jan 1, 2011 – April 31, 2011). Mining resources.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon , PT and Austin , TX' - conan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Reaction workshop 2011 01 06 task 1 progress report plans lisbon pt and austin tx

REACTION Workshop 2011.01.06Task 1 – Progress Report & PlansLisbon, PT andAustin, TX

Mário J. Silva

UniversityofLisbon, Portugal


Grants paid by reaction
Grants (paid by Reaction)

  • Sílvio Moreira (BI: Oct 1, 2010 – March 31, 2011 )

  • João Ramalho (BIC: Jan 1, 2011 – April 31, 2011)


Mining resources
Mining resources

  • Development of robust linguistic resources to process different types and genres of texts

    • knowledge resources about media personalities: recognizing and resolving references to named-entities;

    • sentiment lexicons and grammars: detecting the polarity of opinions about media personalities

    • annotated corpora: training different text classifiers and evaluating classification procedures


Mining resources1
Mining resources

  • POWER - Political Ontology for Web Entity Retrieval

  • SentiLex-PT01 – Sentiment Lexicon for Portuguese

  • SentiCorpus-PT09 – Sentiment annotated corpus of user comments to political debates


Power
POWER

POWER is an ontology that formalizes the domain knowledge defining a political landscape, i.e., the political actors and their roles in the political scene, their relationships and interactions.

The ontology is foccused in describing:

Politicians

Political Institutions with different levels of authority (International, National, Regional,...)

Political Associations

Political Affiliations and Endorsements

Elections

Mandates


Power1
POWER

Currently, the ontology describes:

587 Political actors

17 (editions) of Political Institutions

16 Political Associations

900 Mandates

1 Election

6 Candidate Lists

from the Portuguese political scene


Sentilex pt01
SentiLex-PT01

SentiLex-PT01 is a sentiment lexicon for Portuguese made up of 6,321 adjective lemmas, and 25,406 inflected forms.

  • The sentiment entries correspond to human predicate adjectives

  • The sentiment attributes described in SentiLex-PT01 concern:

    • the predicate polarity,

    • the target of sentiment, and

    • the polarity assignment (which was performed manually or automatically, by JALC)


Sentilex lem pt01
SentiLex-lem-PT01

6,321 lemmas

abatido.PoS=Adj;TG=HUM;POL=-1;ANOT=MAN

abelhudo.PoS=Adj;TG=HUM;POL=-1;ANOT=MAN

abençoado. PoS=Adj;TG=HUM;POL=1;ANOT=JALC

atrevido, PoS=Adj;TG=HUM;POL=0;ANOT=MAN

bem-educado.PoS=Adj;TG=HUM;POL=1;ANOT=MAN

brega.PoS=Adj;TG=HUM;POL=-1;ANOT=JALC

violento, PoS=Adj;TG=HUM;POL=-1;ANOT=JALC

Recently made publicly available on: http://xldb.fc.ul.pt/wiki/SentiLex-PT01


Sentilex flex pt01
SentiLex-flex-PT01

25,406 inflected forms

abatida,abatido.PoS=Adj;GN=fs;TG=HUM;POL=-1;ANOT=MAN

abatidas,abatido.PoS=Adj;GN=fp;TG=HUM;POL=-1;ANOT=MAN

abatido,abatido.PoS=Adj;GN=ms;TG=HUM;POL=-1;ANOT=MAN

abatidos,abatido.PoS=Adj;GN=mp;TG=HUM;POL=-1;ANOT=MAN

bem-educada,bem-educado.PoS=Adj;GN=fs;TG=HUM;POL=1;ANOT=MAN

bem-educadas,bem-educado.PoS=Adj;GN=fp;TG=HUM;POL=1;ANOT=MAN

bem-educado,bem-educado.PoS=Adj;GN=ms;TG=HUM;POL=1;ANOT=MAN

bem-educados,bem-educado.PoS=Adj;GN=mp;TG=HUM;POL=1;ANOT=MAN

brega,brega.PoS=Adj;GN=fs;TG=HUM;POL=-1;ANOT=JALC

brega,brega.PoS=Adj;GN=ms;TG=HUM;POL=-1;ANOT=JALC

bregas,brega.PoS=Adj;GN=mp;TG=HUM;POL=-1;ANOT=JALC

bregas,brega.PoS=Adj;GN=fp;TG=HUM;POL=-1;ANOT=JALC

Recently made publicly available on: http://xldb.fc.ul.pt/wiki/SentiLex-PT01


Senticorpus pt09
SentiCorpus-PT09

SentiCorpus-PT09 is a collection of comments posted by the readers of the Público newspaper to a series of 10 news articles, each covering a televised face-to-face debate between the main candidates to the 2009 parliamentary elections.

  • The collection is composed by 2,795 comments (~8,000 sentences).

  • 3,537 sentences, from 736 comments (27% of the corpus), were manually labeled with sentiment information.

  • Sentiment annotation involves different relevant dimensions, such as polarity, opinion target, target mention and verbal irony.


  • Main findings
    Main findings

    • Real challenge in performing opinion mining in user-generated content is correctly identifying the positive opinions

      • Positive opinions are less frequent than negative opinions (20%)

      • Positive opinions particularly exposed to verbal irony (11%)

    • Other opinion mining challenges are related to the entity recognition and co-reference resolution sub-tasks

      • mentions to human targets are frequently made through pronouns, definite descriptions and nicknames.

      • The most frequent type of mention is the person name, but it only covers 36% of the analyzed cases.


    Next steps
    Next steps

    April 2011:

    • POWER

      • Populating the ontology, using text-mining approaches

      • Internal release

    • SentiLex-PT01

      • Exploring other methods and algoritms (SVM, Active Learning) for automatic polarity classification

      • Enlarging the sentiment lexicon (verbs, predicate nouns, idiomatic expressions)


    Next steps1
    Next steps

    August 2011:

    • POWER

      • First release to the general public via SPARQL endpoint and web user interface

    • SentiCorpus-PT09

      • Publically available

    • Analysis and (semi-automated) annotation of a collection of documents from industrial and social media, over a period of 6 months


    ad