Download
graphs and functions recurring themes in databases n.
Skip this Video
Loading SlideShow in 5 Seconds..
Graphs and Functions: Recurring Themes in Databases PowerPoint Presentation
Download Presentation
Graphs and Functions: Recurring Themes in Databases

Graphs and Functions: Recurring Themes in Databases

93 Views Download Presentation
Download Presentation

Graphs and Functions: Recurring Themes in Databases

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Graphs and Functions:Recurring Themes in Databases Alex Poulovassilis 29th November 2001

  2. Databases • Databases store information of relevance to a group of users e.g. • employees’ personal details, for a Personnel department • employees’ income details, for a Payroll department • details of molecular structure and interaction, for a Drug company • details of TV broadcasts and ratings, for a TV company 29th November 2001

  3. Data models • The information stored in a database is expressed using a data model • The binary relational data model is a very simple data model • In this model, information is represented using entities and binary relationships between them • These can be represented as the nodes and edges of a graph e.g. here is the schema of a ViewingFigures database: 29th November 2001

  4. 29th November 2001

  5. Data and Schema • The schema of a database defines the type and format of the actual data – it is part of the database’s meta data • The data in the database conforms to the schema. • So a fragment of the ViewingFigures data might be: 29th November 2001

  6. 29th November 2001

  7. 29th November 2001

  8. 29th November 2001

  9. The TriStarp Project • The TriStarp research project, led by Prof Peter King from the mid 1980s, aimed to (1) develop repository technology for binary relational information (2) develop languages for computing with this kind of information • Mir Derakhshan worked on (1). Carol Small and I worked on (2). We were supported by CASE studentships from IBM UK Labs, Prof Geoff Sharman and Norman Winterbottom being our industrial supervisors 29th November 2001

  10. Computing with Binary Relational Data • There are two natural candidates for this: • logic languages - explored by Carol • functional languages - the topic of my PhD research, resulting in the FDL language (1990) 29th November 2001

  11. stars stars P205 Programme x? Actor The Logic Approach • Find all actors who star in programme P205 stars(P205,x?) 29th November 2001

  12. stars stars p? Programme Kevin Bacon Actor The Logic Approach • Find all programmes in which Kevin Bacon stars stars(p?,’Kevin Bacon’) 29th November 2001

  13. stars stars stars Programme p? Actor Kevin Bacon x? The Logic Approach • Find all actors who have starred with Kevin Bacon stars(p?,’Kevin Bacon’),stars(p?,x?) 29th November 2001

  14. The Functional Approach The functional approach interprets binary relationships as functions, leading to the so-called functional data model Programme Actor stars inv_stars 29th November 2001

  15. The Functional Approach • Find all actors who star in programme P205 stars P205 Programme Actor stars inv_stars 29th November 2001

  16. The Functional Approach • Find all programmes in which Kevin Bacon stars inv_stars ’Kevin Bacon’ Programme Actor stars inv_stars 29th November 2001

  17. The Functional Approach Find all actors who have starred with Kevin Bacon [x | pinv_stars ’Kevin Bacon’; xstars p] Programme Actor stars inv_stars 29th November 2001

  18. Find the most popular programme showing at 10pm on 1st November, 2001: let maxViewers = max [viewers s | s  inv_date (1,11,2001); (start s) <= 2200; (end s) > 2200] in [of s | s  inv_viewers maxViewers] More complex queries 29th November 2001

  19. Find the most popular programme showing at time t on date d: mostPopular t d = let maxViewers = max [viewers s | s  inv_date d; (start s) <= t; (end s) > t] in [of s | s  inv_viewers maxViewers] Derived Functions 29th November 2001

  20. Find actors linked to Kevin Bacon via any number of edges labelled stars: linkedTo [‘Kevin Bacon’] where: linkedTo result = let new = [x | y  result; p  inv_stars y; x  stars p] in if (subset new result) then result else linkedTo (new U result) stars Programme Actor Recursive functions linkedTo 29th November 2001

  21. Oracle of Bacon at Virginia www.cs.virginia.edu/oracle 29th November 2001

  22. More generally: linkedTo s = complete (stars,inv_stars) s where: complete (f,inv_f) result = let new = [x | b  result; a  inv_f b; x  f a] in if (subset new result) then result else complete (f,inv_f) (new U result) f A B Higher-order functions linkedTo 29th November 2001

  23. Collaboration Networks Find all people linked to a person P via the author relationship: complete (author,inv_author) [P] Paper Person author inv_author 29th November 2001

  24. Acknowledgements… If we ask the simpler query [x | pinv_author ’Alexandra Poulovassilis’; xauthor p] we obtain the people with whom I have co-authored research papers: J.Bailey K.Benkerimi S.Courtenage P.Demetriades M.Derakhshan B.Heydecker S.Hild P.J.H.King M.Levene N.Lorentzos P.J.McBrien P.Newson E.Nonas R.Offen S.Reddi S.Schwarz C.Small E.Tuv P.T.Wood L.Xu Paper Person author inv_author 29th November 2001

  25. Drawbacks of the Binary Relational Model Despite its elegance, the binary relational model has some drawbacks: (a) large binary relational schemas can be hard to understand (b) it is not so natural for representing higher-dimensional relationships 29th November 2001

  26. The Hypernode Model (a) led to research into nested-graph data models with Mark Levene 29th November 2001

  27. Higher-dimensional relationships An example of problem (b) is the 3-way relationship between Distribution companies, Programmes and TV companies which has to be represented by an entity and 3 binary relationships: Supply DistrCo TVCo Programme 29th November 2001

  28. The PFL Language • This led to the development of a new functional language PFL, with Carol Small, which directly supports higher-dimensional relationships • e.g. the supply relationship is accessed by a single selector function |supply : (DistrCo,Programme,TVCo)  [(DistrCo,Programme,TVCo)] • Some examples: |supply (Any,P205,BBC) |supply (Any,Any,BBC) |supply (Any,P205,Any) 29th November 2001

  29. Active Databases • Up to now, I have been looking at schema, data and derived database information • In the 1990s a new kind of database information was being explored, namely event-condition-action rules of the form: onevent ifcondition do action • ECA rules make a database active in that it can automatically execute actions if events occur and conditions hold 29th November 2001

  30. viewers Showing Number Active PFL • In a project during mid 1990s, we extended PFL with ECA rules (with Swarup Reddi and Carol Small) • For example: oninsert viewers if [s | (s,n)|viewersInc (Any,Any); n < 500000] doinsert [s | (s,n)|viewersInc (Any,Any); n < 500000] lowRated 29th November 2001

  31. PFL’s ECA rule execution semantics We specified these in PFL itself, to experiment before implementing: execSched (db,s) = if s = [] then (db,[]) else execSched (schedRules (exec (head s,db),s)) schedRules (db,a:s) = let (db,pre,suf) = fold schedRule (db,[],[]) (triggers a) in (db,pre ++ s ++ suf) schedRule i (db,pre,suf) = if (eval (event-condition-query i) db) = {} then (db,pre,suf) else updateSched (actions i,mode i,db,pre,suf) 29th November 2001

  32. Analysing and Optimising ECA rules • Techniques are needed for analysing and optimising the behaviour of ECA rules • In a project that started in late 1990s, we have been using the functionalsemantics of ECA rule execution as the basis for developing such techniques (with James Bailey, Simon Courtenage, Pete Newson) • In particular, we have been investigating abstract interpretation and partial evaluation of the rule execution semantics for analysis and optimisation, respectively. 29th November 2001

  33. Abstract execution semantics execSched* (db*,s*) = if s* = [] then (db*,[]) else execSched* (schedRules* (exec* (head s*,db*),s*)) schedRules* (db*,a*:s*) = let (db*,pre*,suf*) = fold schedRule* (db*,[],[]) (triggers a*) in (db*,pre* ++ s* ++ suf*) schedRule* i (db*,pre*,suf*) = if (eval* (event-condition-query i) db*) = False then (db*,pre*,suf*) else updateSched (actions i,mode i,db*,pre*,suf*) 29th November 2001

  34. Correctness of the Abstract Execution • If for all queries q, abstract databases db*, and abstract actions a*: • conc (exec* (a*,db*)) is a superset of [exec (a,db) | (a,db)  conc (a*,db*)] • eval* q db* = False implies that for all db in conc db*,eval q db = {} • then execSched* is a conservative test for • rule termination • rule unreachability 29th November 2001

  35. Partial Evaluation of Rule Execution Produce a specialised equation for schedRules for each kind of rule action that may appear at the head of the schedule: schedRules (db,a1:s) = let (db,pre,suf) = fold schedRule (db,[],[]) (triggers a1) in (db,pre ++ s ++ suf) schedRules (db,a2:s) = let (db,pre,suf) = fold schedRule (db,[],[]) (triggers a2) in (db,pre ++ s ++ suf) . . . 29th November 2001

  36. Partial Evaluation of Rule Execution Suppose action a1 triggers rule 2 and rule 3 (in that order of priority). Then we can replace triggers a1 above by [2,3] and apply fold obtaining: schedRules (db,a1:s) = let (db,pre,suf) = schedRule (schedRule (db,[],[]) 2) 3 in (db,pre ++ s ++ suf) 29th November 2001

  37. Partial Evaluation of Rule Execution Now we can apply schedRule (assuming rule 2 has Immediate scheduling mode and rule 3 Deferred scheduling mode): schedRules (db,a1:s) = let (db,pre,suf) = if (eval (event-condition-query 2) db) = {} then if (eval (event-condition-query 3) db) = {} then (db,[],[]) else (db,[],bind (actions 3) db) else if (eval (event-condition-query 3) db) = {} then (db,bind (actions 2) db,[]) else (db,bind (actions 2) db,bind (actions 3) db) in (db,pre ++ s ++ suf) 29th November 2001

  38. Heterogeneous Databases • So far, I have been discussing single databases • However, larger-scale applications may need to integrate information from several databases, possibly supporting different data models • To integrate information stored in such heterogeneous databases it is necessary to form a single, integrated schema • Conflicts may existing between the various source schemas, which must be removed by applying transformations to these schemas 29th November 2001

  39. 29th November 2001

  40. Graphs for Schema Transformation • In work with Peter McBrien started in late 1990s, we have developed a general framework for transforming and integrating heterogeneous database schemas • We represent schemas expressed in higher-level data models, such as relational or object-oriented, in terms of a nested-graph data model, thus allowing us to transform between different data models • In our schema transformation framework, new schema constructs are defined using queries over existing constructs • In our framework, schema transformations are reversible, thus allowing query and data translation between schemas: 29th November 2001

  41. 29th November 2001

  42. 29th November 2001

  43. addClass Series [p|(p,S)category] addClass Doc [p|(p,D)category] addClass Film [p|(p,F)category] addClass Prog [p|(p,c)category] 29th November 2001

  44. addSubClass Film Prog addSubClass Doc Prog addSubClass Series Prog addClass Series [p|(p,S)category] addClass Doc [p|(p,D)category] addClass Film [p|(p,F)category] addClass Prog [p|(p,c)category] 29th November 2001

  45. addSubClass Film Prog addSubClass Doc Prog addSubClass Series Prog addClass Series [p|(p,S)category] addClass Doc [p|(p,D)category] addClass Film [p|(p,F)category] addClass Prog [p|(p,c)category] delRel category [(p,F)|pFilm] U [(p,D)|pDoc] U [(p,S)|pSeries] 29th November 2001

  46. addConstraint subset Film Prog addConstraint subset Doc Prog addConstraint subset Series Prog addNode Series [p|(p,S)category] addNode Doc [p|(p,D)category] addNode Film [p|(p,F)category] addNode Prog [p|(p,c)category] delEdge category [(p,F)|pFilm] U [(p,D)|pDoc] U [(p,S)|pSeries] delNode Programme Prog delNode Category [F,D,S] 29th November 2001

  47. delConstraint subset Film Prog delConstraint subset Doc Prog delConstraint subset Series Prog delNode Series [p|(p,S)category] delNode Doc [p|(p,D)category] delNode Film [p|(p,F)category] delNode Prog [p|(p,c)category] addEdge category [(p,F)|pFilm] U [(p,D)|pDoc] U [(p,S)|pSeries] addNode Programme Prog addNode Category [F,D,S] 29th November 2001

  48. Query Translation • Given a transformation from a schema S1 to a schema S2, and a query Q on S1, we use the delete transformation steps to substitute for constructs of S1 which are not in S2 e.g. from the previous slide: • [title p | p  Film U Doc] on: translates into • [title p | p  [p | (p,F)  category] U [p | (p,D)  category] on: 29th November 2001

  49. Functions for Database Integration • In the formal specification of our framework, each schema transformation is a function t : Database  Database where a database consists of schema+data • We are currently implementing our framework within the Automed project • We are planning to handle query language heterogeneity in Automed by translation into/from a functional intermediate query language 29th November 2001

  50. Future Research • Extending Automed to also handle materialised views and view updates, leading to a data warehousing approach to data integration • Data warehousing of genomic data (in collaboration with Profs Thornton, Orengo, Barton, and Drs Keller, Martin, Shepherd) • Moving beyond database integration and database dynamics to data integration on the Web and Web dynamics: • handling XML data sources within Automed • developing an ECA rule language for XML 29th November 2001