300 likes | 426 Views
Schemas, Patterns, Frames and Knowledge in Systems Development. Chris Wallace Nov 2006. Topics. Frames in computing and psychology Schemas in Database applications Recurrent problems in Information Systems Matching problems. Minsky’s Frames.
E N D
Schemas, Patterns, Frames and Knowledgein Systems Development Chris Wallace Nov 2006
Topics • Frames in computing and psychology • Schemas in Database applications • Recurrent problems in Information Systems • Matching problems
Minsky’s Frames • Marvin Minsky is one of the fathers of artificial intelligence • “When one encounters a new situation (or makes a substantial change in one's view of the present problem) one selects from memory a structure called a Frame. This is a remembered framework to be adapted to fit reality by changing details as necessary” • “A frame is a data-structure for representing a stereotyped situation, like being in a certain kind of living room, or going to a child's birthday party. Attached to each frame are several kinds of information. Some of this information is about how to use the frame. Some is about what one can expect to happen next. Some is about what to do if these expectations are not confirmed.” • “Thinking always begins with suggestive but imperfect plans and images; these are progressively replaced by better–but usually still imperfect–ideas.” • Marvin Minsky, A Framework for Representing Knowledge, MIT-AI Laboratory Memo 306, June, 1974
Gestalt psychology • ‘Gestalt’ – form; pattern; shape; organised whole or unit • notion of pre-existing schemata, or organisational frameworks for structuring information, as opposed to perception built up from visual stimulus alone. • In the case of visual perception, a schemata provides a framework within which external stimuli are sensible. [A frog’s visual and motor system is pre-programmed for fly-recognition]
Database Schemae.g. The Bus timetables Development descriptive What is a route? service schema Domain of Discourse Use prescriptive A route is a …
Database Schema • Schema has a fixed structure with a set of place holders to allow variation: • A slot for the single destination of all buses on the route • So a route can’t have multiple destinations • But the last 70 goes to Muller Road Depot, not Centre • So this will have to be a different route • Or alter schema to allow over-ride destination in departure • A Schema structures our perception of the world • When the fit is poor: • We can force or miss-use the schema • We can change the Schema
Design Studies • Cognitive psychologists observe that for an experienced analyst designing a familiar object in a familiar domain, the overall task could be characterised as being dominated by the retrieval of previously-stored knowledge (Adelson and Soloway). • Research by Curtis et al has shown the importance of domain knowledge in design, where domain knowledge entails understanding of the problems which occur in a specific application field and the tried and tested (and the failed) solutions. • Brown and Chandrasekaran identify the strategy of 'Design by Critiquing and Modifying Almost Correct Designs' as one of four main design processes (decomposition, design plans, and constraint solving are the others).
Pre-structuring • Hillier, Musgrove and O'Sullivan suggest that [building] design is essentially a matter of prestructuring problems based on the designer's knowledge : • of solution types • of the 'latencies of the instrumental set' (the raw materials + time) in relation to solution types • of informal 'codes' which relate problems to solution types • “it is not a matter of whether the problem is pre-structured but how it is pre-structured, and whether the designer is prepared to make this prestructuring the object of his critical attention” • Hillier Musgrove and O’Sullivan Knowledge and Design 1972
Patterns • Christopher Alexander and colleagues wrote an influential collection of 253 ‘patterns’ for living – good solutions to recurring problems in architecture. • Language intended to provide a basis for community architecture • Gamma et al picked up this work and applied it to the emerging area of Object-oriented Programming. • Provides a language for developers to use within a project • ‘We need the Observer pattern here’ • Pattern conferences develop a rich array of patterns encapsulating the experience of developers in many software and organisational contexts.
What Kind of System is it? • Kinds of system: • a repository of data • a model of a Domain of Discourse • a decision support system • For police officers selecting volunteers • For the witness identifying persons at the scene of crime • a co-ordinator of human activity • a tool for the construction of an artefact • a learning system • ..... • Real systems have aspects of several kinds but it helps to focus of each viewpoint in turn, then integrate them – divide and rule
System as Repository • Common to view an information system as a simple repository of data. • Data is collected, stored, kept safe from falling into the wrong hands, from being lost. • Data can be extracted, reorganised, removed if no longer relevant • Data should only be accepted if it conforms to rules for good data -‘integrity rules’ • Also called CRUD (Create, Retrieve, Update, Destroy) • Repository may be computer based or in some other form – paper, brain, organization
Repository in PIP • In the problem, we need to be able to • store data about suspects and volunteers • readily retrieve data when required • keep the data secure from loss, or access by the wrong people • Metaphor : SYSTEM IS A BANK
System as Model • System contains a model of the ‘domain of discourse’ . This model is used to present re-organised data about the domain of discourse. • Issues: • How much of the real world to model, in what detail to support needs of users, now and in the future? • How to ensure that the model and the real world stay closely in synchrony - that changes to the real world (a volunteer moves house) are reflected in the model (address field updated) quickly and accurately • How best to represent the model for specific purposes and audiences • Metaphor: SYSTEM IS A MAP
Model in PIP • How is a suspect to be represented in the system for the purposes of the PIP? • How is a volunteer to be represented? • How do we know when a volunteer changes in some relevant way? • Removes beard • Moves • Who should be able to view the suspect’s details?
System as Resource Manager • Allocation of scarce resources, such as • seats on flights • rooms for lectures • beds for patients • volunteers to identity parades • Problem is to maximise utilisation of resources whilst minimising delay and inconvenience to parties • Common Issues • Granularity of resource units • Handling time • Modelling and visualising allocations • Cancellation policy • Over-booking policy • Allocation policy - priorities, up-grading … • A common sub-problem is to match a client’s needs to available resources
System as Matchmaker • System matches • Volunteers to suspects • Needs to means • Requirement to resources • Matching requires • Models of both parties • A definition of what a good match is • A process to choose the pair which has the best match • Metaphor: System is a Dating agency
Common Matching systems • From the easy to the very hard: • A word to a dictionary of valid words • A poorly typed word to a dictionary of valid words • T9 predictive texting • A customer on the phone to bank accounts • De-duping mailing lists • CD DB - CD recognition • A search request to documents on the web • Shazam – sound sample matching • COTS selection • IS problem solving
String Matching • How close are two strings – words, DNA sequences? • Levenshtein distance • is the number of single character edits required to change one to the other using the operations of: • inserting a letter • deleting a letter • replacing a letter • E.g. • Distance(receipt,tecept) = 2 • Distance(receipt,reciept) = 2 • Need a theory of why the strings are different • Better theory for typing would be to count transposition as 1 edit instead of 2 • Better theory for texting would be to count a replace by a letter on the same key less than a letter on a different key. • mutations in DNA matching
Soundex and Metaphone • Surnames in English have multiple spellings for similar sounds • Wallace and Wallis, Smith and Smythe • Errors caused by similar phonetics having different spelling • Useful where sound-text transliteration occurs in data capture • e.g. Smith and Smythe • Soundex (Odell and Russell 1922) reduces every word to a letter and 3 digits – S530 for both • Metaphone (Philips 1990) smarter about English phonetics – SM0 for both • Double Metaphone – improved and two codes – one english, one ‘foreign’ • Comparison of algorithms
Anagrams • Find an anagram for an English word • ORCHESTRA • Matching function: • Break word into letters and sort the letters • ACEHORRST • Match with a dictionary in which all words have a sorted letters field • CARTHORSE • HORSECART
CD DB • Database of 2.5 million CD’s, track details and supporting matter run by gracenote (www.gracenote.com) • Used by media players to obtain track info • Player sends signature of CD [sequence of track lengths in 1/4sec] to match against the database (via HTTP) • Application searches DB for best match and returns track info to media player. • Matching algorithm described in US Patent 6,061,680
Shazam - 2580 • Shazam is a mobile phone application • It can recognise 3.2 million tracks from a 20 sec sample – new tracks added at 5,000 a week • The track details are texted back within about 30secs • It costs 50p + 9p call charge (surcharge only if successful) • Your personal page shows the tracks you have tagged • Track1Track2 • www.shazam.com • digitalmusicnews • “The Shazam Music Recognition Service” • ISMIR 2003 presentation
Chatbots • Chatbots like ALICE simulate a human response to typed input • Most are for fun or annoyance • Increasingly being used for customer service, helpdesks, marketing • Based on matching patterns in text • The patterns are in an XML application called AIML
De-duping A catalogue from O’Reilly C Wallace West England University Coldharbour Lane Frenchay Bristol BS16 1QY Ms C Wallace Univ. of the West of England Frenchay Campus Coldharbour Lane Bristol BS16 1QY One person or two? Mailing lists are reported with 25 – 40% duplicates.
Commercial Of the Shelf Software (COTS) • Software exists for most business needs: • payroll • order processing • general ledger • human resources • e-commerce • e.g. SAP, SAGE .. • Analysts need to match business needs to COTS capability, and customise generic software for local business rules.
Matching in general • Matching task typically involve: • two sets of individuals : e.g. • the suspect / sampled track / DNA sample - The Requirement • the volunteers / 1.7 million stored tracks / DNA on file – The Resource • ‘adequate’ representations of both • a ‘fitness’ function which calculates how well matched a Resource is to the Requirement • a process to achieve the matching goal • Matching processes: • Single or Batch? • Single: One Req to many Resources • Batch: Many Reqs to many Resources (e.g. cutting) • Automatic, Interactive, Assistive • Automatic: Matching fully automated • Interactive: User makes final selection, adjusts weights • Assistive : Computer produces analyses which aid human selection
Design is matching • We look at problems and recognise the type of system required • The type of system identified determines the questions we ask • Pros • Fast, draws on experience • Cons • Easy to jump to the wrong conclusion • Real systems can be seen from several different viewpoints at the same time.
Tutorial • Individually • Suggest 3 other applications where Matching is a core function • Group • Select from these N/2 systems of particular interest (where N is the class size) • In Pairs • Take one of the systems and research these questions • What is being matched to what? • How are the subjects being modelled? • How is the best match determined?