1 / 13

A Semantic Text Classification Based on DBpedia

AST2009. A Semantic Text Classification Based on DBpedia. Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo 255049, China { brj, ljhbrj}@sdut.edu.cn. OUTLINE. 1.BACKGROUND 2. DBpedia 3.OUR PROPOSED METHODS 4.EXPERIMENT 5.CONCLUSION.

emory
Download Presentation

A Semantic Text Classification Based on DBpedia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AST2009 A Semantic Text Classification Based on DBpedia Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo 255049, China { brj, ljhbrj}@sdut.edu.cn

  2. OUTLINE • 1.BACKGROUND • 2. DBpedia • 3.OUR PROPOSED METHODS • 4.EXPERIMENT • 5.CONCLUSION

  3. 1.BACKGROUND • “Bag of Words” (BOW) .VS. “Bag of Conceptions” (BOC) • Semantic Features Representation

  4. 2. DBpedia DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web.

  5. 3.OUR PROPOSED METHODS • Definition 1 (Core Ontology). A core ontology is a structure O := (C,<c) consisting of a set C, whose elements are called concept identifiers, and a partial order <c on C, called concept hierarchy or taxonomy. • Definition 2 (Subconcepts and Superconcepts).If c1 <c c2 for any c1, c2 ∈ C, then c1 is a subconcept (specialization) of c2 and c2 is a superconcept (generalization) of c1. If c1 <c c2 and there exists no c3 ∈ C with c1 <c c3 <c c2, then c1 is a direct subconcept of c2, and c2 is a direct superconcept of c1, denoted by c1﹤ c2.

  6. 3.OUR PROPOSED METHODS The candidate expression detection algorithm Input: document d = {w1,w2, …,wn}, Lex = (SC;RefC) and window size k ≥ 1. i 1 list Ls index-term s while i≤n do for j = min(k, n - i + 1) to 1 do s {wi…wi+j-1} if s ∈ SC then save s in Ls i i + j break else if j = 1 then i i + j end if end for end while return Ls

  7. 4.EXPERIMENT

  8. 4.EXPERIMENT • Datasets • Our goal is to obtain a high performance for closely related categories. Therefore, in order to test our approach, we designed a robot to crawler a data set from Yahoo! Website. It is contained the closely related (ambiguous) categories under Science->Biology . The test categories under Science->Biology considered here for Training and Testing are: Bio-Archaeology, Bio-Informatics, Genetics, Food Science and Microbiology.

  9. 4.EXPERIMENT Table 1. Confusion Matrix before Applying Semantic Processing

  10. 4.EXPERIMENT Table 2. Confusion Matrix after Applying Semantic Processing

  11. 4.EXPERIMENT Fig.3 Accuracy from Semantic Representation Terms vs. Bag of Words

  12. 5.CONCLUSION • In this paper, we have discussed a novel approach to applying DBpedia’s background knowledge represent documents for boosting text categorization performance. • Our approach and experiments prove that applying semantic level processing and normalization help in achieving higher accuracies over classification of documents, which have words with cross category references.

  13. END THANKS!

More Related