On Learning Parsimonious Models for Extracting Consumer Opinions

On Learning Parsimonious Models for Extracting Consumer Opinions Edoardo Airoldi Data Privacy Laboratory School of Computer Science Carnegie Mellon University eairoldi@cs.cmu.edu Xue Bai and Rema Padman The John Heinz III School of Public Policy and Management and Center for Automated Learning and Discovery Carnegie Mellon University {xbai,rpadman}@andrew.cmu.edu International Conference on System Sciences 2005

Introduction • Past text categorization solutions yield cross-validated accuracies and areas under the curve (AUC) in the low 80%s when ported to sentiment extraction. • The reason for the failure: features used in the classification (e.g. the words) are assumed to be pairwise independent. • We propose an algorithm that • is able to capture dependencies among words, and • is able to find a minimal vocabulary, sufficient for categorization purposes. • Our two-stage Markov Blanket Classifier (MBC) • 1st Stage: Learns conditional dependencies among the words and encodes them into a Markov Blanket Directed Acyclic Graph • 2nd Stage: Uses a Tabu Search (TS) to fine tune the MB DAG

Related Work (1/2) • Hatzivassiloglou and McKeown built a log-linear model to predict the semantic orientation of conjoined adjectives. • Das and Chen • Manually construct lexicon and grammar rules • They categorized stock news as buy, sell or neutral • Use five classifiers and various voting schemes to achieve an accuracy of 62% • Turney and Littman proposed a semi-supervised method to learn the polarity of adjectives starting from a small set of adjectives of known polarity. • Turney used this method to predict the opinions of consumers about various objects (movies, cars, banks) and achieved accuracies between 66% and 84%.

Related Work (2/2) • Pang et al. used off-the-shelf classification methods on frequent, non-contextual words in combination with various heuristics and annotators, and achieved a maximum cross-validated accuracy of 82.9% on data from IMDb. • Dave et al. categorized positive versus negative movie reviews using SVM, and achieved an accuracy of at most 88.9% on data from Amazon and Cnn.Net. • Liu et al. proposed a framework to categorize emotions based on a large dictionary of common sense knowledge and on linguistic models.

Bayesian Networks (1/2) • A Bayesian network for a set of variables X = {X1, ...,Xn} consists of • a network structure S that encodes a set of conditional independence assertions among variables in X; • a set P = {p1, ..., pn} of local conditional probability distributions associated with each node and its parents.

Bayesian Networks (2/2) • P satisfies the Markov condition for S if every node Xi in S is independent of its non-descendants, conditional on its parents. pai : the set of parents of Xi p(Y,X1, ...,X6) = C · p(Y | X1) · p(X4| X2, Y ) ·p(X5 | X3, X4, Y ) ·p(X2| X1) · p(X3| X1) · p(X6| X4)

Markov Blanket (1/2) • Given the set of variables X, target variable Y anda Bayesian network (S, P), the Markov Blanket for Y consists of • paY, the set of parents of Y ; • chY, the set of children of Y ; and • pa chY,the set of parents of children of Y . p(Y |X1, ...,X6) = C’ · p(Y |X1) · p(X4|X2, Y ) ·p(X5|X3,X4, Y )

Markov Blanket (2/2) • Different MB DAGs that entail the same factorization for p(Y |X1, ...,X6) belong to the same Markov equivalence class. • Our algorithm searches the space of Markov equivalent classes, rather than that of DAGs, thus boosting its efficiency.

Tabu Search • Tabu Search (TS) is a powerful meta-heuristic strategy that helps local search heuristics explore the space of solutions by guiding them out of local optima. • The basic Tabu Search starts with a feasible solution and iteratively chooses the best move, according to a specified evaluation function. • Keeping a tabu list of restrictions on possible moves, updated at each step, which discourage the repetition of selected moves.

MB Classifier 1st Stage: Learning Initial Dependencies (1/2) • 1. Search potential parents and children(LY) of Y, and potential parents and children (∪iLXi) of nodes Xi ∈ LY • This search is performed by two recursive calls to the function findAdjacencies(Y) • 2. Orient the edges using six edge orientation rules • 3. Prune the remaining undirected edges and bi-directed edges to avoid cycles

MB Classifier 1st Stage: Learning Initial Dependencies (2/2) • findAdjacencies (Node Y, Node List L, Depth d, Significance α) 1. AY:= {Xi ∈ L: Xi is dependent of Y at level α} 2. for Xi ∈ AYand for all distinct subsets S ⊂ {AY \ Xi}d 2.1. if Xi is independent of Y given S at level α 2.2. then remove Xi from AY 3. for Xi ∈ AY 3.1. AXi:= {Xj ∈ L: Xj is dependent of Xi at level α,j ≠i} 3.2. for all distinct subsets S ⊂ {AXi}d 3.2.1. if Xi is independent of Y given S at level α 3.2.2. then remove Xi from AY 4. return AY

MB Classifier 2nd Stage: Tabu Search Optimization • Four kinds of moves are allowed:edge addition, edge deletion, edge reversal, and edgereversal with node pruning. • Always select best move. • The tabu list keepsa record of m previous moves.

Experiments 1 (1/3) • Data set: 700 positive and 700 negative movies reviews from the rec.arts.movies.reviews newsgroup archived at the Internet Movie Database (IMDb). • We considered all the words that appeared in more than 8 documents as our input features (7716 words). • 5-fold cross-validation

Experiments 1 (2/3)

Experiments 1 (3/3)

Experiment 2 (1/2) • Data set: • Mergers and acquisitions (M&A, 1000 documents) • Finance (1000 documents) • Mixed news (3 topics × 1000 documents) • The sentiments we consider are three: positive, neutral and negative. • 3-fold cross-validation

Experiment 2 (2/2)

Conclusions • We believe that in order to capture sentiments we have to go beyond the search for richer feature sets and the independence assumption. • We believe that a robust model is obtained by encoding dependencies among words, and by actively searching for a better dependency structure using heuristic and optimal strategies. • The MBC achieves these goals.

On Learning Parsimonious Models for Extracting Consumer Opinions

On Learning Parsimonious Models for Extracting Consumer Opinions

Presentation Transcript

Consumer Learning

Models of Consumer Behaviour

Opinions

Extracting lexical information with statistical models

Extracting Opinions from Reviews

Consumer Learning

Consumer Learning

An Overview on Opinions

Parsimonious Forecasting

Parsimonious Forecasting

Extracting Opinion Topics for Chinese Opinions using Dependence Grammar

A Simple Method for Extracting Models from Protocol Code

Static Analysis of HDL Descriptions: Extracting Models for Verification

Machine Learning Models on Random Graphs

Learning Models

Extracting Models from ISO 26262 for Reusable Safety Assurance

Parsimonious Forecasting For Starbucks

Opinions On Effective Design For Manufacturing

Importance of Consumer Behaviour | Consumer Behaviour Models

Extracting lexical information with statistical models

Learning Models