1 / 22

Domain-Specific Iterative Readability Computation

Domain-Specific Iterative Readability Computation. Jin Zhao 13/05/2011. Domain-Specific Resources. Domain-Specific Resources. Domain-specific resources targets at varying audiences. Modular arithmetic page from Wikipedia. Modular arithmetic page from Interactivate.com.

pembroke
Download Presentation

Domain-Specific Iterative Readability Computation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011

  2. Domain-Specific Resources WING, NUS

  3. Domain-Specific Resources Domain-specific resources targets at varying audiences. Modular arithmetic page from Wikipedia Modular arithmetic page from Interactivate.com WING, NUS

  4. Challenge for a Domain-Specific Search Engine How to measure readability for domain-specific resources? WING, NUS

  5. Literature Review • Heuristic-based Readability Measures • Weighted sum of text feature values • Examples: • Flesch Kincaid Reading Ease (FKRE): [Flesch48] • Dale-Chall readability formula: [Dale&Chall48] Quick and indicative but often oversimplify WING, NUS

  6. Literature Review • Natural Language Processing and Machine Learning Approaches • Extract deep text features and use supervised learning methods to generate models for readability measurement • Text Features • Unigram [Collins-Thompson04], Parse tree height [Schwarm05], Discourse relations [Pitler08] • Supervised learning techniques • Support Vector Machine (SVM) [Schwarm05], k-Nearest Neighbor (KNN) [Heilman07] More accurate but annotated corpus required and ignorant of the domain-specific concepts WING, NUS

  7. Literature Review • Domain-Specific Readability Measures • Derive information of domain-specific concepts from expert knowledge sources • Examples: • Open Access and Collaborative Consumer Health Vocabulary [Kim07] • Medical Subject Headings ontology [Yan06] • Handles domain-specific concepts but expert knowledge sources are still expensive and not always available Key qualities of a goodreadability measure: effective, portable and domain-aware. WING, NUS

  8. Intuitions • Use an iterative computation algorithm to estimate these two scores from each other • Example: • Pythagorean theorem vs. ring theory A domain-specific resource isless readable if it contains more difficult concepts A domain-specific concept is more difficult if it appears in less readable resources WING, NUS

  9. Iterative Computation (IC) Algorithm • Graph Construction • Construct a graph representing resources, concepts and occurrence information • Score Computation • Initialize and iteratively compute the readability score of domain-specific resources and the difficulty score of domain-specific concepts • Two versions: heuristic and probabilistic • Required Input • A collection of domain-specific resources • A list of domain-specific concepts WING, NUS

  10. Graph Construction Resource 1 Concept List …Pythagorean theorem can be written as a2 + b2 = c2, where c represents the length of the hypotenuse… … right triangle Pythagorean theorem hypotenuse sine function cosine function … Resource 2 …The sine function (sin) can be defined as the ratio of the side opposite the angle to the hypotenuse… Resource 2 Resource 1 right triangle Pythagorean Theorem hypotenuse sine function cosine function WING, NUS

  11. Score Computation (Heuristic) 2.00 4.00 3.00 1.00 • Initialization • Resource Node (FKRE) • Concept Node (Average score of its adjacent nodes) Resource Nodes w x y z Concept Nodes a b c Initialization 2.00 2.50 3.00 • Iterative Computation • Each node(Original score + average of the original scores of its adjacent nodes) 3.00 5.25 4.75 7.00 Resource Nodes w x y z Concept Nodes a b c Iteration 1 4.00 6.00 5.00 WING, NUS

  12. Score Computation (Heuristic) 9.75 10.25 13.00 7.00 Resource Nodes w x y z Concept Nodes a b c Iteration 2 8.13 10.00 11.88 15.13 18.82 21.19 24.88 • Termination Condition • The rank order of the resource nodes stabilizes Resource Nodes w x y z Concept Nodes a b c Iteration 3 23.51 20.00 16.51 WING, NUS

  13. Score Computation (Heuristic) • Single-valued score for each node • Unable to handle concepts of varying difficulties • Simple averaging in score computation • Difficult to incorporate sophisticated computational mechanisms WING, NUS

  14. Score Computation (Probabilistic) • Initialization • Resource Node (Sentence Sampling) • Concept Node (Resource Sampling) Resource Nodes w x y z Concept Nodes a b c Initialization

  15. Score Computation (Probabilistic) • Iterative Computation • Modified Naïve Bayes Classification Original: Direct Adaptation: Modified: Resource Nodes Concept Nodes

  16. Evaluation • Key qualities of a good readability measure • Effectiveness • Portability • Domain-awareness WING, NUS

  17. Effectiveness • Corpus of Math Webpages • Metrics: • Pairwise accuracy • Spearman’s rho • Baseline: • Heuristic • FKRE • Supervised learning • NB, SVM, MaxEsnt • Binary concept features only WING, NUS

  18. Portability • Different selection strategies • Resource selection at random • Concept selection at random • Resource selection by quality • Concept selection by TF.IDF • Performance measurement at 5 levels • 20%, 40%, 60%, 80% and 100% of the original resource collection / concept list WING, NUS

  19. Portability Concept Selection Strategies Resource Selection Strategies WING, NUS

  20. Portability WING, NUS

  21. Domain-awareness • Handling of domain-specific concepts • Simple yet effective • Concepts of multiple difficulty levels? • Converge to single value even in PIC • Splitting? (K-Means, GMM, etc.) • Other computational mechanisms? WING, NUS

  22. Conclusion • Iterative Computation • Estimate the readability of domain-specific resources and difficulty of domain-specific concepts in a iterative manner • Effective, Portable and Domain-aware • Future Work • Handling of concepts of multiple difficulty levels WING, NUS

More Related