1 / 34

LOD2 KOREA : Towards Publishing Korean Linked Data on the Web

LOD2 KOREA : Towards Publishing Korean Linked Data on the Web. Key-Sun Choi. Joint work with Martin Rezk Jungyeul Park. Yoon Yongun Kyungtae Lim. YoungGyun Hahm. Key-Sun Choi - Personal History. NEC C&C Lab. – PIVOT Japanese-Korean Machine Translation

freira
Download Presentation

LOD2 KOREA : Towards Publishing Korean Linked Data on the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LOD2 KOREA :Towards Publishing Korean Linked Data on the Web Key-Sun Choi Joint work with Martin Rezk JungyeulPark Yoon Yongun Kyungtae Lim YoungGyunHahm

  2. Key-Sun Choi - Personal History • NEC C&C Lab. – PIVOT Japanese-Korean Machine Translation • Korean Part-of-Speech Tagset, Corpus, Dictionary • CoreNet (Korean-Chinese-Japanese) Semantic Wordnet (2004) • KORTERM: Korea Terminology Research Center for Language and Knowledge Engineering (1998-2007), Research Center of Ministry of Culture • KAIST Research Grand Award(1998) • ISO/TC37/SC4 Founding member (Language Resource Management Standards) • ISWC 2007 PC Co-Chair (International Semantic Web Conference) • AFNLP President(2009-2010) • DBPediaKorea http://ko.dbpedia.org/ • http://lod2.eu/ partner (EU FP7)

  3. NLP2RDF • Triple in Natural Language • Subject • Object • Predicate • Extract from Sentences • 野生種의 장미는 主로北半球의 溫帶와 寒帶 地方에 分布한다. • Wild rose is located mainly in the northern hemisphere of its temperate and figid zones. • Subject : 장미 (rose) • Object : 북반구의 온대지방, 한대 지방 (Northern hemi-sphere, Temperate and Frigid Zones) • Predicate : 分布 (isDistributedAt) Key-Sun Choi - LOD2 Korea

  4. 4

  5. 마이크로소프트 Wind River 실시간 임베디드 운영체제 통신 미들웨어 미디어 플레이어 응용 프로그램 비실시간 임베디드 운영체제 VxWorks WinCE pSOS VRTX 미들웨어 브라우져 임베디드 소프트웨어 임베디드 시스템 임베디드 운영체제 운영체제 DVD 플레이어 개발환경 RTOS 소프트웨어 가전기기 시스템 플랫폼 제조회사 셋탑박스 디지털카메라 MP3플레이어 consists_of 제조사 reside_on 5

  6. NLP2RDF <Conceptonal Layer> <DBpedia> (based on DBpedia Ontology) Barack Obama URI = dbpedia12415 (conceptonal Unique) <Career> President <Nationality> United States <Party> Democrats ,,, LOD algorithm Barack Obama is the President of the United States Barack Obama URI = sen1word1 (documentary Unique) <POStag> NNG </POStag> ,,, The Output of NLP tools “KNIF” Wrapper Sentence: ‘Barack Obama is the President of the United States’

  7. For these work • For RDF Mapping • Triples and URI • Ontology • String Ontology • Structured Sentence Ontology • NIF and Korean language • For LOD Mapping • URI for DBpedia entity • Mapping Word in Text  DBpedia Key-Sun Choi - LOD2 Korea

  8. Parser tree to Summary • 물체의 낙하 거리는 시간의 제곱에 비례한다 • <Triple> • Subject • 물체의 낙하거리 • Predicate • 비례한다 • Contents • 시간의 제곱 Key-Sun Choi - LOD2 Korea

  9. Why NLP? Why Syntactic,Semantics? • Advanced technology on the higher-level layers Key-Sun Choi - LOD2 Korea

  10. NLP Layer Cake Key-Sun Choi - LOD2 Korea

  11. Semantic Web vs. NLP layer cake Key-Sun Choi - LOD2 Korea

  12. How to develop parser and semantic classifier creatively? • Open Source NLP tools • Rich English, Japanese open tools/resources • A few Korean tools • How to adapt Korean tools to the already developed tools • Already developed Koreanlanguage resources • KAISTtools/resources • KAIST open source in sourceforge and web • Cambridge University Press: NLP Textbook (undergoing) • Linked Data – http://lod2.eu/ partner Key-Sun Choi - LOD2 Korea

  13. Background • The idea of linking data from different sources is not new: • Network Database Model: 70’s • Linked Data: Today • The goal is to facilitate sharing and re-using information. • Linked Data aims to extend the Web with data commons by creating typed links between data from different sources Key-Sun Choi - LOD2 Korea

  14. Background • These links are usually modeled using the Resource Description Framework (RDF) • Each piece of data is identified with an URI • The first task towards linking data is to identify which resources and which properties we want to describe Key-Sun Choi - LOD2 Korea

  15. Introduction • NLP2RDF is a LOD2 Community project that is developing the NLP Interchange Format (NIF) • NIF aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations • The output of NLP tools can be converted into RDF and used in the LOD2 Stack • http://nlp2rdf.org • NIF… • Is based on RDF/OWL • Enables users to annotate for several languages in a uniform way • Enables users to query text documents with SPARQL (EX http://semanticweb.kaist.ac.kr/nlp2rdf/) • Sentence : 다크나이트는 미국의 영화이다. • Dark knight is a American film. Key-Sun Choi - LOD2 Korea

  16. Key-Sun Choi - LOD2 Korea

  17. NIF Wrapping • NLP Interchange Format (NIF) is an RDF/OWL-based format that allows to combine and chain several Natural Language Processing (NLP) tools in a flexible, light-weight way. Key-Sun Choi - LOD2 Korea Sebastian Hellmann, AKSW, UniversitatLeipzig, NLP Interchange Format(NIF)

  18. Structure of NLP2RDF NLP Layer Interchange Layer Key-Sun Choi - LOD2 Korea Data Layer

  19. EnglishNLP Example of NLP Layer Tokenization InputSentence CFG Parser Dependency Parser Key-Sun Choi - LOD2 Korea

  20. How to create RDF from NLP output Process Example My dog also likes eating sausage. Raw Texts NLP Tools output Key-Sun Choi - LOD2 Korea NIF Wrapper StanfordWrapper.Java RDF

  21. Example of NLP2RDF in ENG • http://nlp2rdf.lod2.eu/demo.php • Sentence: Obama is the president of USA. <http://prefix.given.by/theClient#offset_0_5> sso:oliaLink <http://purl.org/olia/penn.owl#NNP> ; sso:posTag "NNP" ; sso:lemma "Obama" ; str:referenceContext<http://prefix.given.by/theClient#offset_0_30> ; str:anchorOf "Obama" ; rdf:typesso:Word , str:String . Key-Sun Choi - LOD2 Korea

  22. Korean NLP2RDF • Resources: morphemes, words (eojeols) and sentences in Korean • Properties: POS, grammatical roles, etc. • Problems to solve: • Linguistic Modeling (OLiA) • Processing Korean Text (NLP) • How to Produce and Query RDF Key-Sun Choi - LOD2 Korea

  23. Linguistic Modeling (1) • We use OLiA(Ontologies of Linguistic Annotation) to link the Sejongtagsetwith language-independent reference concepts. • Sejongtagset is a Korean default standard • OLiA consists of three different ontologies: • the OLiAreference model (language-independent), • the OLiAannotation model (depends on the tagset), • the OLiAlinking model (depends on the tagset). • We developed afragment of these last two ontologies for Korean, that is, for the Sejongtagset. Key-Sun Choi - LOD2 Korea

  24. Linguistic Modeling (2) • We use the NIF(NLP Interchange Format) to • standardizethe input/output of the different tools to ease to connection among them, and to • uniquely identify (parts of) text, entities and relationships. • NIF provides two URI schemes to identify resources • Offset-based • Hash-based • We opt in our application for the Hash-based Key-Sun Choi - LOD2 Korea

  25. Korean NLP2RDF Platform RAW Text • HanNanum • Korean Open Source Morpheme Analyzer • Developed by SWRC, KAIST Morpheme Analyzer • Korean Berkeley Parser • Training set: Modified Sejong Treebank(DongHyun Choi, Jungyeul Park, Key-Sun Choi , Korean Treebank Transformation for ParsrTraining, ACL - SPMRL 2012) • F1-score: 82.12% Parser Key-Sun Choi - LOD2 Korea Wrapper • Produce triples • Use OLiA (Ontologies of Linguistic Annotation) to link the Korean tagsets with language-independent reference concepts • The OLiA annotation model and the OLiA linking model produce triples using the Sejongtagset NIF output

  26. Korean Language information KoreanNLP Korean Grammar Framework Input KoreanSentence Morph.Analyzer CFG Parser Parsedresult URI, Tag DataBase Dependency Parser Mappings Ontologies RDF generator OnTopFramework RDF triples Key-Sun Choi - LOD2 Korea SPARQL Query SPARQL Query Handler RDF triples

  27. NIF Output • Each piece of data is identified with an URI (Hash-based) • Resources: Morphemes, Words (eojeols), Sentences in Korean • Properties: POS-tag, Grammatical roles, etc. Key-Sun Choi - LOD2 Korea Some produced triples DEMO site: http://semanticweb.kaist.ac.kr/nlp2rdf Parsing results

  28. NIF Output 이탈리아에서 공부하고 온 마틴은 한국을 사랑합니다. Martin who came from Italy after studying there loves Korea. Key-Sun Choi - LOD2 Korea

  29. Specific Issues of Korean • Korean Tagset • Linking with OLiA Ontology: String Ontology Structured Sentence Ontology (SSO) OLiA Penn Parser Output String Word, Sentence, Phrase,,, Tag ,,, Sejong Tag Set Key-Sun Choi - LOD2 Korea NLP2RDF: Produce Triples RDF output

  30. Key-Sun Choi - LOD2 Korea

  31. Conclusions: • We presented a framework that allows • processingKorean text, • Efficiently producing RDF triples, and • queryingthe NLP tools outcome • The RDF outcome of our framework is compliant with the NIF (NLP Interchange Format) and the OLiA ontologies to facilitate its combination with other NLP tools • Future: • complete the development of the language-dependent part of the OLiAontologies, • include the missing features required by NIF, • allow richer SPARQL queries, and • disambiguate the different entities in the text and link them with Wikipedia articles. Key-Sun Choi - LOD2 Korea

  32. Issues • DBpedia • How to link between produced triples and DBpedia triples • Josa (postposition case marker) • Korean specific grammatical feature Key-Sun Choi - LOD2 Korea Sentence : 다크나이트는미국의 영화이다. Sentence : Dark knight is the American movie.

  33. Source • OnTop • https://babbage.inf.unibz.it/trac/obdapublic/wiki/ObdalibPluginIntro • Demo Site : for Korean • http://semanticweb.kaist.ac.kr/nlp2rdf • Demo site : for English • http://nlp2rdf.lod2.eu/demo.php • NLP2RDF • http://nlp2rdf.org Key-Sun Choi - LOD2 Korea

  34. Key-Sun Choi, Mun-Yong Yi, In-Young Koh, Younghee Lee(CS/WebST, Knowledge Service Eng., CS/WebST, CS)Tony Veale (Invited Professor, Computational Creativity)Yoon, Yong-Un (research professor, NLP+DB)Martin Rezk (postdoctoral researcher, Logic)Park, Jung-Yeol (researcher, parser)Lee, Jae-Sung (Professor, morphology and word)Graduate Students:Soon-Gil Hong, Young-GyunHahm , KyungtaeLim, Se-Mi Jang, Youngho Jeong, … http://ko.dbpedia.org/http://semanticweb.kaist.ac.krkschoi@kaist.ac.kr

More Related