Classifying Reading Levels Using Statistical Language Models: A Study by Johnson and Hsieh

Classifying Reading Levels with Statistical Language Models Johnson Hsieh Sameer Shariff

Problem • Given a passage of English text, can we classify the text at the appropriate reading level? • Dataset – English novels from various reading lists • 5th-7th grade - 15 books • Tom Sawyer, Black Beauty, etc. • 8th-10th grade - 16 books • A Tale of Two Cities, The Call of the Wild, etc. • 11th-12th grade - 17 books • Pride and Prejudice, The Awakening, etc.

Approach • Build language models for each class • Classify new text based on model that was most likely to generate this text (generative model) • Model 1 • Classify text based purely on these language models with some interesting smoothing techniques • Model 2 • Build a discriminative multinomial logistic regression model that uses these language models as just one of many features

Data Separability • A hard problem

Language Model Results Accuracy = (# of books predicted correctly)/(total # of books) Weighted Accuracy = ((# predicted correctly) + 0.5 * (# off by one))/(total # of books)

Multinomial Logistic Regression Results • Without language model: • With language model:

Conclusions and Future Work • Statistical language models do capture information that can help differentiate between different reading levels, better than traditional measures such as Flesch-Kinkaid • Multinomial logistic regression models with additional features outperform the pure language model approach, though using the language model as a feature greatly improves performance • Future Work • Explore higher order language models • Investigate language model overfitting

Classifying Reading Levels Using Statistical Language Models: A Study by Johnson and Hsieh

Classifying Reading Levels Using Statistical Language Models: A Study by Johnson and Hsieh

Presentation Transcript

Statistical Forecasting Models

Language Levels

Levels of Language

Linear Statistical Models

Statistical Inventory Models

Extracting lexical information with statistical models

Building Statistical Models

III.4 Statistical Language Models

Language Models

Language Models

Hydrologic Forecasting With Statistical Models

Reading with English Language Learners

Statistical Language Modelling Part I – Observable Models

Language Models

Linguistically Rich Statistical Models of Language

Reading Models

Levels of Language

Statistical Shape Models

Classifying Reading Levels with Statistical Language Models

LEVELS OF LANGUAGE

Statistical / empirical models

Extracting lexical information with statistical models