1 / 32

Hands on Classification with Learning Based Java

Hands on Classification with Learning Based Java. Gourab Kundu Adapted from a talk by Vivek Srikumar. Goals of this tutorial. At the end of these lectures, you will be able to Get started with Learning Based Java Use a generic, black box text classifier for different applications

edith
Download Presentation

Hands on Classification with Learning Based Java

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hands on Classification with Learning Based Java GourabKundu Adapted from a talk by VivekSrikumar

  2. Goals of this tutorial At the end of these lectures, you will be able to • Get started with Learning Based Java • Use a generic, black box text classifier for different applications …and write your own text classifier, if needed • Understand how features can impact the classifier performance … and add features to improve your application • Build a badge classifier based on character features

  3. A Quick Recap • Given:Examples (x,f(x)) of some unknown functionf • Find: A good approximation of f • x provides some representation of the input • The process of mapping a domain element into a representation is called Feature Extraction. (Hard; ill-understood; important) • x€{0,1}nor x€Rn • The target function (label) • f(x)€ {-1,+1} Binary Classification • f(x)€{1,2,3,.,k-1} Multi-class classification

  4. What is text classification? ✗ ✗ ✓ A classifier (black box) ✗ A document Some labels

  5. Several applications fit this framework Sentiment classification Spam detection What else can you do, if you had such a black box system that can classify text? Try to spend 30 seconds brainstorming

  6. Outline of this session Getting started with LBJ Writing our first classifier: Spam/Ham Playing with features Looking inside the black box classifier for feature weights

  7. Writing classifiers Learning Based Java

  8. What is Learning Based Java? • A modeling language for learning and inference • Supports • Programming using learned models • High level specification of features and constraints between classifiers • Inference with constraints • Different learning algorithms • The learning operator • Classifiers are functions defined in terms of data • Learning happens at compile time

  9. What does LBJ do for you? Abstracts away the feature representation, learning and inference Allows you to write learning based programs Application developers can reason about the application at hand

  10. Demo A learning based program First, we will write an application that assumes the existence of a black box classifier

  11. Spam detection

  12. Spam detection Subject: save over 70 % on name brand software ppharmacy devote fink tungstate brown lexicon pawnshop crescent railroad distaff cytosine barium cain application elegy donnelly hydrochloride common embargo shakespeareanbassett trustee nucleolus chicanonarbonne telltale tagging swirly lank delphinus bragging bravery cornea asiaticsusanne Subject: please keep in touch just like to say that it has been great meeting and working with you all . i will be leaving enron effective july 5 th to do investment banking in hong kong . i will initially be based in new york and will be moving to hongkong after a few months . do contact me when you are in the vicinity . How do you know? Which of these (if any) are email spam?

  13. What do we need to build a classifier? Annotated documents* A feature representation of the documents A learning algorithm * Here we are dealing with supervised learning

  14. Our first LBJ program Defines a classifier /** A learned text classifier; its definition comes from data. */ discrete TextClassifier(Document d) <- learnTextLabel usingWordFeatures fromnew DocumentReader("data/spam/train") withSparseAveragedPerceptron{ learningRate = 0.1 ; thickness = 3.5; } 5 rounds testFromnew DocumentReader("data/spam/test”) end The object being classified The function being learned The feature representation The source of the training data The learning algorithm

  15. Demo • Let’s build a spam detector • How to train? • How do different learning algorithms perform? Does this choice matter much?

  16. Features Our current spam detector uses words as features Can we do better? Let’s try it out

  17. More text classification

  18. Sentiment classification I recently made the switch from PC to Mac, and I can say that I'm not sure why I waited so long. Considering that I have only had my computer a few weeks I can't say much about the durability and longevity of the hardware, but I can say that the operating system (mine shipped with Lion) and software is top notch. I've been an Apple user for a long time, but my most recent MacBook Pro purchase has convinced me to reconsider. I've had several hardware issues, including a failed keyboard, battery failure, and a bad DVD drive. Now, the backlight on the display fails to turn on when waking from sleep How do you know? Which of these product reviews is positive?

  19. Classifying news groups I am looking for Quick C or Microsoft C code for image decoding from file for VGA viewing and saving images from/to GIF, TIFF, PCX, or JPEG format. I have scoured the Internet, but its like trying to find a Dr. Seuss spell checker TSR. It must be out there, and there's no need to reinvent the wheel. How do you know? alt.atheism comp.graphics comp.os.ms-windows.misc comp.sys.ibm.pc.hardware comp.sys.mac.hardware comp.windows.x misc.forsale rec.autos rec.motorcycles rec.sport.baseball rec.sport.hockey sci.crypt sci.electronics sci.med sci.space soc.religion.christian talk.politics.guns talk.politics.mideast talk.politics.misc talk.religion.misc Which mailing list should this message be posted to?

  20. Demo • Converting our spam classifier into a • Sentiment classifier • A newsgroup classifier • Note: How different are these at the implementation level?

  21. Most of the engineering lies in the features ✗ ✗ ✓ A classifier (black box) ✗ A document Some labels

  22. Summary What is LBJ? How do we use it? Writing a simple spam detector Playing with features How much do we need to change to move to a different application?

  23. Assignment before Next Class (Not Graded) • Download the code & data (http://l2r.cs.uiuc.edu/~danr/Teaching/CS446-12/handsonclassification.html) for this class and play with it • Try to solve the Badges game puzzle with LBJ • Think about what features are needed • Write a parser for reading the data • Write a classifier for solving the puzzle

  24. Next Class Questions We will solve the Badges Game puzzle by Machine Learning We will look at more text classification examples We will think about a famous people classifier

  25. Badge Classifier • Brainstorm the possible Features • Characters in entire name • Two consecutive Characters • Character as Vowel, Character as Consonant • …. • … • Feature Engineering is Important (especially if labeled data is small) • What is the baseline? 70 +, 24 -

  26. the famous people classifier

  27. The Famous People Classifier f( ) = Politician f( ) = Athlete f( ) = Corporate Mogul

  28. The NLP version of the fame classifier All sentences in the news, which the string Barack Obama occurs Represented by All sentences in the news, which the string Roger Federer occurs All sentences in the news, which the string Bill Gates occurs

  29. Our goal • Find famous athletes, corporate moguls and politicians

  30. Let’s brainstorm • How do we build a fame classifier? Remember, we start off with just raw text from a news website

  31. One solution All sentences in the news, which the string Barack Obama occurs • Let us label entities using features defined on mentions • Identify mentions using the named entity recognizer • Define features based on the words, parts of speech and dependency trees • Train a classifier

  32. Summary Questions • Get started with Learning Based Java • Use a generic, black box text classifier for different applications …and write your own text classifier, if needed • Understand how features can impact the classifier performance … and add features to improve your application • Build a badge classifier based on character features

More Related