matching in information systems n.
Skip this Video
Loading SlideShow in 5 Seconds..
Matching in Information Systems PowerPoint Presentation
Download Presentation
Matching in Information Systems

Loading in 2 Seconds...

play fullscreen
1 / 33

Matching in Information Systems - PowerPoint PPT Presentation

  • Uploaded on

Matching in Information Systems. ISD3 Lecture 11. Contents. Matching exercises Integrity and Fidelity Fidelity as a matching problem – between the world and its representation in the system Stateful-stateless interaction Co-evolution of user-machine fitness.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Matching in Information Systems' - marlon

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
  • Matching exercises
  • Integrity and Fidelity
    • Fidelity as a matching problem – between the world and its representation in the system
  • Stateful-stateless interaction
    • Co-evolution of user-machine fitness
fuzzy matching in the telephone directory
Fuzzy Matching in the Telephone Directory
  • UWE telephone directory
    • Only fuzzy matching is partial matching on initial string
      • ‘wall’ finds ‘wallace’, ‘wallis’, ‘walls’, …
    • Easy to do in SQL
        • ..where surname like ‘reqsurname%’
    • Substring matching anywhere is slower
        • .. Where surname like ‘%reqsurname%’
telephone schema
Telephone Schema


  • Facilities(‘help desk’, ‘reception’ etc) forced to fit Person schema
  • Lack of inclusion in schema creates searching problems:
    • Helpdesk
    • Help desk
    • CSM help desk
  • No support for categories of facility to control vocabulary
    • A Naming and Classification problem
  • Need for generalisation:

Surname : str

Firstname : str

ExtNo : str




distance fitness function
Distance (fitness) function
  • Distance (P1, P2) =
    • Distance(P1, P2-Pref) + Distance(P2,P1-Pref)
  • Individual differences:
    • agediff = if P1.age <P2-Pref.min or P1.age >P2-Pref.max ? 1000 : 1 – abs(P1.age / ((P2-Pref.min+P2-Pref.max)/2 ))
    • gendiff = P1.gen == P2-Pref.gen ? 1000 : 0
    • s1diff = abs(P1.s1 – P2-Pref.s1)
    • s2diff = abs(P1.s2 – P2-Pref.s2)
  • Combined weighted differences
    • Euclidean distance
    • sqrt (wtage*agediff^2 + wtgen*gendiff^2 + wts1*s1diff^2 + wts2*s2diff^2…..)
  • Problems
    • Age is a ratio scale (40 is twice as old as 20)
    • Preference scales are not – rating a scenario a 6 does not imply it is twice as good as a rating of 3 – Preference scales are Ordinal
    • Age and Gen are go-no go – simulated by very high value for a mismatch
  • Data in a database should agree with the rules in the schema
    • Checks on values
    • Referential integrity
    • Primary key
  • A weak schema allows erroneous data
    • E.g. Invalid manager relationships in the Emp-Dept example
    • Need for extended Business rules in middle tier of application
  • HiFi “exactitude in reproduction”
  • A database as an image of its Domain of Discourse (Real World)
  • Loss of fidelity when:
    • Two records in database but only one person in the RW
    • Address data does not correspond to an existing address in the RW
    • Address in database does not correspond to the current address of its owner
  • But fidelity only has to be ‘good enough’ for its purpose
  • Veracity means roughly the same – ‘truthful’
data quality
Data Quality
  • Poor data quality results from loss of integrity and lack of fidelity.
  • “Current data quality problems cost US businesses more that $600 billion per year” (report by the Data Warehousing Institute, 2002
  • Gartner Research estimates that through 2005 more than 50% of business intelligence and CRM deployments will suffer limited acceptance if not outright failure due to lack of attention to data quality issues.
  • Direct costs of poor quality information estimated at between 10% and 20% of revenue
information systems computer systems
Information systems / computer systems
  • Computer system quality depends only on ensuring the system doesn’t fall over when presented with bad data
  • Information Systems quality depends on ensuring the system delivers information of high quality
  • Information System includes procedures and guidance to users to meet this need.
problem analysis
Problem analysis
  • Analyse chain of cause and effect of poor quality
  • Systems approach:
    • Information system:
      • Data flow model analysed for points where errors can be injected
    • Organisation:
      • Attitudes and ethos
data flow in the information system
Data Flow in the Information System
  • Information source
  • Information gathering
  • Information collation
  • Information storage
  • Information retrieval
data source problems
Data source problems
  • Data has only a limited lifetime of fidelity since world is in constant flux
  • Length of lifetime depends on
    • Volatility of the data source – address for young out-of-work person or address of retired person
  • Need to re-validate data on a cycle dependent on the lifetime
data capture
Data capture
  • Data gathering procedures a major source of error.
  • Integrity and Fidelity can be in conflict
    • If telephone number is mandatory, operator in hurry will enter any old number to get the record accepted
  • Data quality depends on training and guidance given to operators
  • Matching of new applicants with existing applicants is poor so duplicates generated.
  • Postcodes accepted even if not matching Post Office database
  • Database integrity failures or loss of backup data, or reload with duplicates (auto number primary key)
improvement process
Improvement Process
  • Based on learning cycle
    • Shewart cycle – Plan- Do –Check – Act
    • Deming cycle
    • Six Sigma – Define-measure-analyse-improve-control
    • Kolb learning cycle – act – reflect – theorise – plan
improvement learning cycle
Improvement/ Learning Cycle
  • Measure and observe the current process
  • Analyse / develop theory of causes of problem
  • Plan changes based in the theory
  • Put plan into effect
  • Measure /observe the resultant improvement ….
stateless stateful interaction
Stateless/ Stateful Interaction
  • Stateless
    • Person interacts with machine
    • Machine response depends only on the request (and the state of data sources..)
    • Each interaction is independent of previous interactions with the same person
    • Machine has no memory of previous interactions
    • Person presumably does have memory of previous interactions!
  • Stateful
    • Machine has memory of previous interactions
    • Response to an request depends on only on the current request but on previous interactions
    • Support for ‘long-running transactions’ such as placing an order, booking a holiday, buying the best house insurance
example stateless stateful interactions
Example stateless/stateful interactions
  • Person- organisation
    • I enter my local supermarket
    • I enter my local pub
  • Person – organisation
    • I make a purchase from my local supermarket with a loyalty card
    • I go to my local pub for a drink
  • Person – website
    • I click on a link to the UWE website
    • I click on a link to a site and I’m prompted to accept a cookie
stateful interaction
Stateful interaction
  • Advantages
    • Interaction is not one sided – I remember how the system has behaved, it remembers something about me and how I’ve behaved
    • Interaction is more like talking to another person
    • Machine can make better decisions about a suitable response
  • State can be a problem too
    • Stateful behaviour can be hard to understand.
    • Bad memories - ‘let’s just start all over again’
    • Modal dialogue problem
      • Application puts up a modal dialogue box which must be responded to before anything else happens.
      • Dialogue box gets hidden behind other windows.
machine side state mechanisms
Machine-side state mechanisms
  • A state mechanism has to deal with
    • What to store about the interaction
      • How much information about the user to retain
      • Issues : explicit/ implicit, transaction log, data protection act
    • How to store the state for the duration of the interaction
      • Length of interaction ranges from a site visit to ‘forever’
      • Issues : what to store, security, reliability, access by other applications
    • Matching a user to a stored state – the ‘identity’ problem
      • How is a user identified
      • Issues : can id be spoofed, is id secure, can identity be mistaken..
storing the state
Storing the state
  • Hidden fields in form
    • Server can sent data to the user in a hidden field, which will then be returned when the user resubmits
  • Session variable
    • Server can store data keyed by a session variable – session id can be sent back in hidden field
  • Cookies
    • Server sends the user a cookie to store data which is send back when the user next visits the site
  • Database
    • State is stored in a database keyed by some user characteristic
identifying the user
Identifying the user
  • IP address of client machine
  • Session id
  • User id – login id, National Insurance number, passport number …
    • Cahoot internet bank problem last week
  • Address
  • Mobile phone number
  • Biometric data – finger print, iris pattern..
what to keep
What to keep
  • State must grow and change as the system learns more about you.
  • State of interaction includes:
    • Current attributes of user : name, company..
    • History of every interaction allows unanticipated questions to be asked – cf data mining
    • Derived / deduced attributes – total expenditure, most recent address
  • For data protection reasons, must not retain any more information than necessary??
  • State can be defined using a ER model even if not stored in a database
explicit implicit distinction
Explicit / Implicit distinction
  • Explicit
    • Facts held as data in the database
      • the person’s name and address
  • Implicit
    • The implicit assumptions about the user which are built into the system:
      • The user’s language, ethnicity, location, capabilities
  • Implicit -> Explicit
    • Surfacing assumptions
    • Representing assumptions explicitly
      • multi-language responses
user s model of the machine
User’s model of the machine
  • User’s need to develop their model of the machine to be able to us it effectively
  • Part of the machine’s task is to help the user develop an appropriate model of itself.
  • User’s have an implicit model of the machine – preconceptions about how to use it.
  • What does a person’s model of the machine look like and how does it develop?
strategies to help the user
Strategies to help the user
  • Reduce the need for the user to have an extensive machine model
  • Provide guidance
  • Design the interaction to work in the way a user would naturally expect:
    • Donald Norman’s idea of affordance
      • The door handle example
  • Use natural language
  • Follow / establish standards
sms currency converter
SMS Currency converter
  • Exercise last year to design an SMS currency converter.
  • More difficult interaction design than a web page converter:
    • No list of currencies to select from
    • Message length limits explanations
  • More interesting
    • Input is limited natural language
    • User is mobile
currency converter stateful interaction
Currency converter – stateful interaction
  • Stateful interaction
    • Request: Cur 100 GBP USD
      • Machine stores from and to codes as state, identified by originating mobile number
    • Request: Cur 200
      • Machine identifies the request as originating from the same user, no from or to code supplied, so default to stored values
    • Request: Cur 100 GBP EUR
      • From and to codes set , so update state
currency converter message format
Currency Converter – message format
  • Natural interaction
    • Allow multiple and surrounding spaces
    • Allow all sensible ordering of codes
      • 100 gbp usd
      • Gbp 100 usd
      • Gbp usd (assume 1 unit)
    • Allow noise words
      • Convert 100 usd into eur
    • Allow synonyms
      • Convert 100 pounds into euros (assume GBP)
    • Allow mistypes?
      • 100 GPB ERU
currency converter help
Currency converter - help
  • Helpful feedback
    • If request not understood, give helpful response
      • Format of request
      • Codes for common currencies
      • Reference to source of codes
    • Support country to currency code query (perhaps by another service to get basic country data?)
    • Should help be stateful – not the same response each time, but one which depends on what has already been send ( but how long ago?)