1 / 7

Met óda Ontea

Met óda Ontea. Pracovná dielňa NAZOU 21-23. 9. 2007. Poľana. Pattern based annotation Podobné metódy C-PANKOW, SemTag Iné jazyky ako angličtina Slovenčina Rýchlejšie a presnejšie ako C-PANKOW Umožňuje aj tvorbu inštancií, SemTag nie. Príspevok k stavu poznania.

totie
Download Presentation

Met óda Ontea

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metóda Ontea Pracovná dielňa NAZOU 21-23. 9. 2007. Poľana

  2. Pattern based annotation Podobné metódy C-PANKOW, SemTag Iné jazyky ako angličtina Slovenčina Rýchlejšie a presnejšie ako C-PANKOW Umožňuje aj tvorbu inštancií, SemTag nie Príspevok k stavu poznania NAZOU, 21-23. 9. 2007, Poľana

  3. Príspevok k stavu poznania – nástroj Ontea • Pattern • PatternRegExp: annotate(), vráti množinu resultov • Result: napr. (Bratislava, region:Settlement) • ResultRegExp • ResultOnto • ResultTransformer • LuceneRelevance • SesameIndividualSearch • SesameIndividualSearchAndCreate • TvaroslovnikLemmatizer NAZOU, 21-23. 9. 2007, Poľana

  4. Nový experiment pre Ontea creation Ontea Creation + indexovanie: Experiment s RFTS a Lucene indexing Lematizácia Overovanie – úšpešnosť (1) NAZOU, 21-23. 9. 2007, Poľana

  5. Overovanie– rýchlosť (2) • Ontea Creation: the instances of ontological concepts are created in the input text collection based on regular patterns matching. • produce OWL ontology files which need to be integrated on central machine. • Created instances are evaluated by computing their relevance using RTFS or Lucene indexing tool. The instances with relevance value above given threshold are identified as relevant and filled in result domain ontology OWL file. (stage related to RTFS tool) • Ontea Search: process for searching annotation tags within annotated text similarly to step one but using general keyword matching patterns. This results to executing more ontology queries and thus consuming more time. • Last stage integrated produced semantic metadata to one knowledge base represented by OWL file. NAZOU, 21-23. 9. 2007, Poľana

  6. Overovanie – rýchlosť (3) • 500 job offers documents takes ~ 67 minutes • Intel(R) Pentium(R) 4 CPU 2.40GHz • About 35000 Slovak offers on the web, many more in English language • This means that periodic annotation of jobs takes ~78 hours = more then 3 days • Step 1 and 3 can run as distributed • Tests run on 500 job offers documents which takes ~ 67 minutes • This means that periodic annotation of jobs takes ~78 hours = more then 3 days • When submitting jobs with e.g. 1000 documents of job offers on one node ~134 minutes = 1000 doc on 35 nodes in grid = 35000 doc • (1000 document set ~ 3M) • + 10 minutes of grid middleware overhead + ~60 minutes data integration • On grid ~ 204 minutes = 3 hours 24 minutes NAZOU, 21-23. 9. 2007, Poľana

  7. Michal Laclavík, Marek Ciglan,Martin Šeleng, LadislavHluchý: Empowering Automatic Semantic Annotation in Grid, PPAM 2007, Springer, LNCS Michal Laclavík, Marek Ciglan, Martin Šeleng, Stanislav Krajčí, Peter Vojtek, Ladislav Hluchý: Semi-automatic Semantic Annotation of Slovak Texts, SLOVKO 2007 Michal Laclavík, Marek Ciglan,Martin Šeleng; Ontea: Semi-automatic Pattern based Text Annotation empovered with Information Retrieval Methods; NAZOU-ITAT, 2007 Michal Laclavik, Martin Seleng, Emil Gatial, Zoltan Balogh, Ladislav Hluchy: Ontology based Text Annotation – OnTeA; Information Modelling and Knowledge Bases XVIII. IOS Press, Amsterdam, Marie Duzi, Hannu Jaakkola, Yasushi Kiyoki, Hannu Kangassalo (Eds.), Frontiers in Artificial Intelligence and Applications, Vol. 154, February 2007, pp.311-315. ISBN 978-1-58603-710-9, ISSN 0922-6389. Publikácie NAZOU, 21-23. 9. 2007, Poľana

More Related