Enhancing Text Classification using Semantic Features from DBpedia

AST2009 A Semantic Text Classification Based on DBpedia Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo 255049, China { brj, ljhbrj}@sdut.edu.cn

OUTLINE • 1.BACKGROUND • 2. DBpedia • 3.OUR PROPOSED METHODS • 4.EXPERIMENT • 5.CONCLUSION

1.BACKGROUND • “Bag of Words” (BOW) .VS. “Bag of Conceptions” (BOC) • Semantic Features Representation

2. DBpedia DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web.

3.OUR PROPOSED METHODS • Definition 1 (Core Ontology). A core ontology is a structure O := (C,<c) consisting of a set C, whose elements are called concept identifiers, and a partial order <c on C, called concept hierarchy or taxonomy. • Definition 2 (Subconcepts and Superconcepts).If c1 <c c2 for any c1, c2 ∈ C, then c1 is a subconcept (specialization) of c2 and c2 is a superconcept (generalization) of c1. If c1 <c c2 and there exists no c3 ∈ C with c1 <c c3 <c c2, then c1 is a direct subconcept of c2, and c2 is a direct superconcept of c1, denoted by c1﹤ c2.

3.OUR PROPOSED METHODS The candidate expression detection algorithm Input: document d = {w1,w2, …,wn}, Lex = (SC;RefC) and window size k ≥ 1. i 1 list Ls index-term s while i≤n do for j = min(k, n - i + 1) to 1 do s {wi…wi+j-1} if s ∈ SC then save s in Ls i i + j break else if j = 1 then i i + j end if end for end while return Ls

4.EXPERIMENT

4.EXPERIMENT • Datasets • Our goal is to obtain a high performance for closely related categories. Therefore, in order to test our approach, we designed a robot to crawler a data set from Yahoo! Website. It is contained the closely related (ambiguous) categories under Science->Biology . The test categories under Science->Biology considered here for Training and Testing are: Bio-Archaeology, Bio-Informatics, Genetics, Food Science and Microbiology.

4.EXPERIMENT Table 1. Confusion Matrix before Applying Semantic Processing

4.EXPERIMENT Table 2. Confusion Matrix after Applying Semantic Processing

4.EXPERIMENT Fig.3 Accuracy from Semantic Representation Terms vs. Bag of Words

5.CONCLUSION • In this paper, we have discussed a novel approach to applying DBpedia’s background knowledge represent documents for boosting text categorization performance. • Our approach and experiments prove that applying semantic level processing and normalization help in achieving higher accuracies over classification of documents, which have words with cross category references.

END THANKS！

Enhancing Text Classification using Semantic Features from DBpedia

Enhancing Text Classification using Semantic Features from DBpedia

Presentation Transcript

Automatic Text Classification

Semantic Video Classification Based on Subtitles and Domain Terminologies

A Survey on Text Classification

Text Classification

Identifying free text plagiarism based on semantic similarity

TEXT CLASSIFICATION

On Compression-Based Text Classification

Text Classification

Text Classification

Text Classification

Text Classification

A Text Categorization Based on summarization Technique

Text Classification

TEXT CLASSIFICATION -----SVM-based Approach

Text Classification

Classification Text

Text Classification

TEXT CLASSIFICATION