classification n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Classification PowerPoint Presentation
Download Presentation
Classification

Loading in 2 Seconds...

play fullscreen
1 / 32

Classification - PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on

Classification. Lecture 11. Topics. Tutorial Review Classification Frame Terminology and measures Using Classifications In system use In system development Creating Classifications Card sorting. Fuzzy Matching in the Telephone Directory. UWE telephone directory

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Classification' - mason


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
classification

Classification

Lecture 11

topics
Topics
  • Tutorial Review
  • Classification Frame
  • Terminology and measures
  • Using Classifications
    • In system use
    • In system development
  • Creating Classifications
    • Card sorting
fuzzy matching in the telephone directory
Fuzzy Matching in the Telephone Directory
  • UWE telephone directory
    • Only fuzzy matching is partial matching on initial string
      • ‘wall’ finds ‘wallace’, ‘wallis’, ‘walls’, …
    • Easy to do in SQL
        • ..where surname like ‘reqsurname%’
    • Substring matching anywhere is slower
        • .. Where surname like ‘%reqsurname%’
telephone schema
Telephone Schema

Person

  • Facilities(‘help desk’, ‘reception’ etc) forced to fit Person schema
  • Lack of inclusion in schema creates searching problems:
    • Helpdesk
    • Help desk
    • CSM help desk
  • No support for categories of facility to control vocabulary
    • A Naming and Classification problem
  • Need for generalisation:

Surname : str

Firstname : str

ExtNo : str

Person

Contact

Facility

distance fitness function
Distance (fitness) function
  • Distance (P1, P2) =
    • Distance(P1, P2-Pref) + Distance(P2,P1-Pref)
  • Individual differences:
    • agediff = if P1.age <P2-Pref.min or P1.age >P2-Pref.max ? 1000 : 1 – abs(P1.age / ((P2-Pref.min+P2-Pref.max)/2 ))
    • gendiff = P1.gen == P2-Pref.gen ? 1000 : 0
    • s1diff = abs(P1.s1 – P2-Pref.s1)
    • s2diff = abs(P1.s2 – P2-Pref.s2)
  • Combined weighted differences
    • Euclidean distance
    • sqrt (wtage*agediff^2 + wtgen*gendiff^2 + wts1*s1diff^2 + wts2*s2diff^2…..)
  • Problems
    • Age is a ratio scale (40 is twice as old as 20)
    • Preference scales are not – rating a scenario a 6 does not imply it is twice as good as a rating of 3 – Preference scales are Ordinal
    • Age and Gen are go-no go – simulated by very high value for a mismatch
classification frame
Classification Frame
  • Classification separates candidates into two or more classes
    • classifying students by grade of degree
  • We will look at the simple case of two classes first:
    • filtering Email : Good or Spam
    • retrieving documents : Relevant or Irrelevant
    • classifying credit card transactions : Valid or fraudulent
    • detecting spelling mistakes : ok or mistake (red line)
    • medical testing : normal or abnormal
    • Systems Requirement : ambiguous or not abmiguous
  • METAPHOR : SYSTEM IS A SIEVE
classification errors information retrieval
Classification Errors (Information Retrieval)

Relevant

Irrelevant

Retrieved

true positive

false positive

(Type 1 error)

false negative

(Type II error)

true negative

Not retrieved

Precision = TP/ (TP + FP) = TP/ Retrieved

Recall = TP / (TP + FN) = TP / Relevant

Efficiency = (TP + TN) / (TP + TN + FP + FN) = (TP+TN) / Full Collection

example calculation email filtering
Example Calculation : email filtering

Good Email

Spam

7

11

  • Precision = TP/ (TP + FP) =
  • Recall = TP / (TP + FN) =
  • Efficiency = (TP + TN) / (TP+TN+FP+FN) =

accept

3

5

reject

example calculation email filtering1
Example Calculation : email filtering

Good Email

Spam

7

11

  • Precision = TP/ (TP + FP) = 3/8
  • Recall = TP / (TP + FN) = 3/7
  • Efficiency = (TP + TN) / (TP+TN+FP+FN) = 9/18= 50%
  • Recall > Precision => not quite balanced

accept

3

5

FP

TP

4

6

reject

TN

FN

trade off
Trade-off
  • The two errors are usually in conflict
    • we can decrease the risk of a False Positive (reject more Spam)
    • but
    • we increase the risk of False Negatives (rejecting good email)
  • a TRADE-OFF
classification errors
Classification Errors

Good student

Poor student

  • Write in the terms – relevant, retrieved, true positive, false positive etc

Fail

Pass

improved precision
Improved Precision
  • Precision = TP/ (TP + FP) = TP/ Retrieved
  • Recall = TP / (TP + FN) = TP / Relevant

FN - False Negatives

relevant

TP -True Positives

FP - False Positives

TN - True Negatives

retrieved

precision and recall
Precision and Recall

Full collection

  • Precision = TP/ (TP + FP) = TP/ Retrieved
  • Recall = TP / (TP + FN) = TP / Relevant
  • Efficiency = (TP + TN) / (TP + TN + FP + FN) = (TP+TN) / Full Collection

FN - False Negatives

relevant

TP -True Positives

FP - False Positives

TN - True Negatives

retrieved

improved recall
Improved Recall
  • Precision = TP/ (TP + FP) = TP/ Retrieved
  • Recall = TP / (TP + FN) = TP / Relevant

FN - False Negatives

relevant

TP -True Positives

FP - False Positives

TN - True Negatives

retrieved

exercise precision and recall in assessment
Exercise: Precision and Recall in Assessment
  • Precision means ……
  • Recall means ….
  • Ideal values (as %)
    • Precision=
    • Recall=
    • Efficiency
  • Estimated values
    • Precision=
    • Recall=
    • Efficiency
classification in the news
Classification in the News
  • Criminal Justice as a Classifer
    • Murder, Manslaughter or Innocent
  • Is ‘Munchausen by Proxy’ a real psychological condition?
  • Prisoners of war – US invents a new category for the Quantanamo Bay prisoners
  • Blood groups:
    • A,B,AB,O
    • RH+ , RH-
  • Classification of Cloud types (Cumulus, Cirrus…) by Luke Howard 1802
  • Hip evaluation to determine priority for replacement
  • Text classification to bring sense to the Internet
categories are information structures
Categories are Information Structures
  • Many systems require the user to classify things in the real world into categories in order to process them:
    • Files and documents on disk
    • Chapters in a dissertation
    • Facilities in the University (helpdesk, reception..
    • Skills in a Placements system
    • Budget headings, Nominal Ledger headings
  • In the computer system, categories can be clearly distinguished:
    • Codes for each category
  • In the real world:
    • categories don’t exist The fallacy of misplaced concreteness
    • multiple taxonomies are valid – classifying the same things in different ways for different purposes
  • Users typically has the task of
    • mapping the real, complex things into the appropriate categories
    • interpreting categorical information
  • Implications
    • IS designers have to devise support for these tasks as well.
    • Users will not be consistent in their classification (e.g. IS books in Library)
categories in is theory
Categories in IS theory
  • Much of IS theory is based on a taxonomy:
    • Problem /solution
    • Method/methodology/technique..
    • ER model
    • Data Flow Diagram
    • Soft Systems Analysis - CATWOE
    • Logical /Physical
    • Swot analysis
      • Strengths/Weaknesses/Opportunities/Treats
    • Objective, Goal, Requirement, Constraint
classification and systems design
Classification and Systems Design

“An early step towards understanding any set of Phenomena is to learn what kinds of things there are in the set – to develop a taxonomy”

Herbert Simon

  • Steps in Classification
    • defining the domain (what kinds of things are to be classified)
    • creating the taxonomy (the set of categories), its purpose and force
    • defining the representation of individuals
    • defining the mapping between individuals and categories
    • coding the categories
    • creating automatic classifiers
    • assisting human classifiers
    • assisting users to interpret categorical information
    • evaluating classification performance
    • supporting evolution of taxonomy and classifiers
a poor classification
A Poor Classification?
  • The Argentinean writer Jorge Luis Borges ‘Imaginary Beasts’, ‘Labyrinths’..) quotes a ‘certain Chinese encyclopaedia’ in which animals are divided into:

A) belonging to the Emperor

B) embalmed

C) tame

D) suckling pigs

E) sirens

F) fabulous

G) stray dogs

H) included in the present classification

I) frenzied

J) innumerable

K) drawn with a very fine camel hair brush

L) et cetera

M) having just broken the water pitcher

N) that from a long way off look like flies

slide22

Machine

Classifier

Human

Categories/Classes

A

B

C

Taxonomy

slide23

Categories not

Mutually Exclusive

Machine

An object can be put in any of

several categories

Classifier

Human

Categories/Classes

A

B

C

Taxonomy

slide24

Categories not

Complete

Machine

Classifier

Some objects don’t

belong anywhere

Human

Categories/Classes

A

B

C

Taxonomy

slide25

Categories not

Balanced

Machine

Some categories

much larger than others

Classifier

Human

Categories/Classes

A

B

C

Taxonomy

slide26

Categories

Inconsistant

Machine

Categories lack a

single organising principle

Classifier

Human

Categories/Classes

A

B

C

Taxonomy

characteristics of a good taxonomy
Characteristics of a good Taxonomy
  • Categories must be:
    • Mutually exclusive
      • Every object in at most one category
    • Complete (exhaustive)
      • Every object in at least one category
    • Balanced
      • Categories divide objects evenly
    • Consistent
      • Same characteristics used throughout
    • Hierarchical integrity
      • Categories at one level not confused with categories at another level
kinds of classification
Kinds of classification
  • Classical
    • Classes defined by presence of features
      • Square : 4 sides, equal length, equal angles
      • Triangle : 3 sides, equal length, equal angles
  • Probabilistic
    • Classes defined by weighted sum of features
      • ‘bird’ moves, winged, feathered, sings, lays eggs
      • Is a robin a bird? Is a emu a bird?
  • Exemplar (prototype)
    • Classes defined by one or more key examples
      • Robin is a central example of ‘bird’
      • Chicken is more remote example
  • Which kind is used in IS Theory?
  • Which kind is used in IS Use?
automated clustering
Automated Clustering
  • Clustering techniques find groups of similar objects
  • Used in data mining to identify customer groups with similar buying behaviour…
  • Mathematical Techniques
    • k-nearest neighbour
    • ID3 to create decision tree
  • Human Techniques
    • Card sorting
classifying
Classifying
  • Learning Classifiers
    • Based on sample of population
    • Classified by hand
    • Split into two parts
      • The training set used to compute the classifier
      • The test set used to test the ability of the classifier
    • Many kinds of classifiers available, all need good understanding of statistics e.g. Naïve Bayesian, Decision Tree, SVM
    • Threshold set to balance recall and precision
  • Rule and example based for human classifier but performance varies with experience and skill
    • E.g. book classification, Yahoo directory classification, medical diagnosis
    • Human classifiers need to be trained too
    • If classification done by end-users, classification is likely to be inconsistent
review
Review
  • 3 tier web architecture – describe, explain, terminology, typical interactions
  • SQL & PHP
    • No exam questions to write SQL or PHP but reading knowledge required – up to outer joins and example scripts
  • Extended ER models
  • Interaction in human and computer systems – sequence diagrams
  • SMS and its applications
  • Web services
  • Agile Development and Extreme Programming – description, application, comparison with life-cycle
  • Frames – rationale, role in IS development, basic recognition in a problem description of simple frames and the following in detail
  • Matching Frame – typical applications, fitness function, recognising nominal, ordinal, interval and ratio scales, use of weights
  • Classification Frame – typical applications, terminology, calculation of recall and precision, guidelines for constructing a taxonomy
preview
Preview
  • XML and XSLT
  • Business Processes
  • Scenarios and Use cases
  • Data Quality
  • Learning Frame