Sentence classification and clause detection for croatian l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 10

Sentence Classification and Clause Detection for Croatian PowerPoint PPT Presentation


  • 136 Views
  • Uploaded on
  • Presentation posted in: General

Sentence Classification and Clause Detection for Croatian . Kristina Vučković, Željko Agić, Marko Tadić Department of Information Sciences, Department of Linguistics Faculty of Humanities and Social Sceinces, University of Zagreb {kvuckovi, zagic, [email protected] FASSBL 7 Conference

Download Presentation

Sentence Classification and Clause Detection for Croatian

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Sentence classification and clause detection for croatian l.jpg

Sentence Classification and Clause Detection for Croatian

Kristina Vučković, Željko Agić, Marko Tadić

Department of Information Sciences, Department of LinguisticsFaculty of Humanities and Social Sceinces, University of Zagreb

{kvuckovi, zagic, [email protected]

FASSBL 7 Conference

Dubrovnik, Croatia2010-10-05


Overview l.jpg

Overview

  • What?

    • classifying Croatian sentences by structure

    • detecting independent and dependent clauses

  • How?

    • implemented a prototype system in NooJ

    • linked it with a morphosyntactic tagger

    • evaluated on a sample from Croatian corpora

  • Why?

    • rule-based chunking and shallow parsing


Classification and detection l.jpg

Classification and detection

  • sentence segmentation is easy when considering sentence boundaries only

  • here, we:

    • detect boundaries of clauses in complex sentences

    • assign type to sentences

  • sentence classification

    • purpose: declarative, interrogative, etc.

    • structure: simple and complex

  • complex sentences

    • independent complex, i.e. compound sentences

    • dependent complex sentences


Classification and detection4 l.jpg

Classification and detection

  • independent complex sentences

    • independent clause connected to the main clause by using a conjunction

    • type defined by the choice of conjunction

      • e.g. constituent clause, conjunctions {i, pa, te, ni, niti}

      • disjunctive, opposite, exclusive, conclusive and explanatory clause

      • Svi su spavali, jedino sam ja bio budan. (exclusive)

  • dependent complex sentences

    • main clause is independent, all the others depend on it and cannot stand alone in a sentence

      • Predicative, subjective, objective, attributive, appositional and adverbial clause

      • Ispričat ću tišto mi se dogodilo.(objective)


The system l.jpg

The system

  • prototype implemented in NooJ

    • finite state transducer cascades (local grammars)

    • Croatian lexical resources

    • each cascade detects and annotates a different type of clause

    • built on top of a chunker for Croatian

  • the top-level grammar

    • two types of subgraphs: main clauses and independent clauses


The system6 l.jpg

The system

  • Main clause grammar

    • presence of a VP and possibly any other phrase

    • independent clauses recognized just by using the conjunctions

    • implementation of dependent clause detection varies across clause types


Experiment setup l.jpg

Experiment setup

  • used the CW100 corpus

    • XCES-encoded to word level

    • sentence delimited, tokenized, manually lemmatized and MSD-annotated

    • 200 randomly selected sentences

      • 100 for the development and 100 for testing

  • utilized the CroTag tagger

    • NooJ input format allows external annotation

    • created three systems

      • no preprocessing

      • tagging input sentences with CroTag (~85% accuracy)

      • using the manually assigned tags from CW100

  • recall, precision, F1-measure


Results l.jpg

Results

  • scores for the three systems

    • “perfect” tagging system is the top-performer

    • benefits of automatic tagging?

  • distribution of assigned types

    • main, objective, opposite, adverbial, attribute, ...

  • misclassifications

    • attributive and objective most commonly misclassified

    • data sparseness


Conclusions and future work l.jpg

Conclusions and future work

  • the system scores good in terms of F1-measure

    • open issues

      • verb coordination

      • dislocated nominal predicates

      • attribute classes starting with a PP

      • complex insertion of dependent clauses

    • no real benefit from automatic MSD-tagging

  • future work

    • resolving the issues

    • re-evaluation on a larger test set?

    • integration with a rule-based shallow parser


Thank you for your attention l.jpg

Thank you for your attention.

The research within the project ACCURAT leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013), grant agreement no 248347.

www.accurat-project.eu


  • Login