1 / 20

Temporal Data Modeling

Temporal Data Modeling. N. L. Sarda IIT Bombay. Temporal Data Modeling. All activities have context of time No explicit support from database systems Approaches Schema for current data Past data in archived, not accessible thru DB schema Or, include history explicitly by time-stamping

Download Presentation

Temporal Data Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Temporal Data Modeling N. L. Sarda IIT Bombay

  2. Temporal Data Modeling • All activities have context of time • No explicit support from database systems • Approaches • Schema for current data • Past data in archived, not accessible thru DB schema • Or, include history explicitly by time-stamping • Alternative : provide support for temporal data

  3. Applications • Administrative – personnel, accounting, … • Medical applications • Insurance • Reservation systems • Planning, …

  4. Conventional way • Add attributes for time, its granularity, meaning • Write statements differently • Explicitly co-relate data over time • Example : Emp table with FROM and TO columns to indicate employment data period Emp (Eno, Dno, Salary, Rank, From, To) • Insert new emp : what value for To ? • Get current salary of Mahesh • Change salary of Mahesh : no UPDATE st ! • Another table Dept : how to find when Mahesh worked in Toy department

  5. Time : Semantics and representation • Many clocks (a physical process that ‘measures’ time) and representations exist • Day clock, year clock, etc • Representations differ in range, granularity, storage size, efficiency of operations • Models of time • continuous, discrete • Linear or branching • Basic units : instant, interval (unanchored), span • Temporal element : as a set of instants on spans • Operations on time : arithmetic, comparisons on instants and intervals (+, <, overlaps, …) • Calendar notion – human interpretation; defines new units and constants • Gregorian calendar most common

  6. Time dimensions • Different interpretations • Basic dimensions: • Real-world or valid time • System or transaction time • Orthogonal • Leads to 4 basic database types : • No time : snap-shot DBs • Only valid time : historical DBs • Only transaction time : Rollback DBs • Both : Bi-temporal DBs

  7. Transaction time • DBMS records system time with data • System time advances automatically : represent by UC • Example : insert Mohan with 10K salary at system time 15 • System is aware of this fact, and continues to ‘know’ it from 15 onwards (until changed) • Tuple <Mohan, 10K, 15, TC> captures this • Removing (deleting) this (at time 35) means this data is not ‘known’ to system anymore after 35. Example : <Mohan, 10K, 15, 35> <Mohan, 12K, 36, UC> • We can rollback system time

  8. Valid Time • Is External for the system • Defines real-world validity of data • Data can be past, current or future with respect to current time <Mahesh, 10K, 1-1-2002, 31-12-2008> • Updates change valid time stamp • Some facts may be valid now and in future • Use NOW as a ‘moving’ time variable

  9. Bi-temporal data model • Each fact is associated with both times • Effectively, system stores everything and never forgets anything ! • A tuple in pure Bi-temporal model contains a set of bi-temporal ‘cronons’ telling when the fact was valid and when it was known to the system • A 2-dim temporal space • A fact represented by graph/area in this space • Example {time by (valid, tran time) pairs} <Mahesh, CSE, {(5,7), (5,9), (5,10), (6,7), …>

  10. Update semantics • Need to be clearly understood for insert, delete • Insert <Meera, CSE> valid over <15..25>: • This fact exists and is current >> reject ! • Does not exist at all : add with tran time UC which will keep expanding • Exists but not current : merge with newer cronon values • Need to be carefully implemented

  11. Choosing interval-based representation • The conceptual model uses temporal element • Not normalized and in-efficient • Use valid and tran time intervals with a fact <Mahesh, CSE, (15..35), (20..UC)> • Define update semantics carefully • At tran time 28, we find that Mahesh was in fact in EE during valid time 21 to 24 • Updating should now give us the following <Mahesh, CSE, (15..35), (20..27)> <Mahesh, CSE, (15..20), (28..UC)> <Mahesh, EE, (21..24), (28, UC)> <Mahesh, CSE, (25..35), (28, UC)> • Need to split intervals carefully

  12. Temporal Normalization • A table now contains user defined and temporal attributes • Time applies to the whole tuple • User defined attributes may have different rates of change, and every change produces ‘history’ • Better to decompose a relation based on rate of change • Salary changes more frequently than Dept • Split Emp (Eno, Salary, Dept) into 2 relations

  13. Temporal algebra and query language • Algebra defines how time to be considered • Usual selection, projection, etc can be defined • Product gives two timestamps >> not a temporal relation ! • Join should relate data with overlapping validity and currency • R JOIN S : check that tuple r in R and s in S have overlapping valid time and they are both current • Result should contain intersection of time elements • Exact definition will depend on representation

  14. Coalesce operation • Interval based representation, while simpler to understand and implement, requires time intervals across result tuples to be compacted • Coalesce operation produces tuples with proper time intervals • Bitemporal tuple has a rectangular time element • Projection, union, etc requires 2 rectangles to be converted into multiple tuples with proper rectangular regions

  15. Event Vs Interval semantics • An application may choose to represent events or states of entities : salary change or salary interval <Mohan, 10K, 15> <Mohan, 10K, 15..26> <Mohan, 15K, 27> <Mohan, 15K, 27..NOW> • Maybe done for both time dimensions • Algebra and query language will need to be oriented towards a specific representation

  16. Choosing a representation • Literature has suggested many representation • Tuple level time stamping Vs attribute level time stamping • Event or interval or set of these • Entity level or tuple level • Entire history in a single tuple • One tuple gives full history across all attributes • Each attribute is multi-valued, each value tome-stamped <e21, Mohan, {(10K, 15) (12K, 27)}, {(CSE, 10),(EE, 18)}>

  17. Modeling Temporal Entities • Entity has key and a set of attributes • Temporal entities have ‘lifespan’, non-changing key (or, surrogate) • Identify non-temporal (non-changing) and temporal attributes • Attribute values also have a time stamp • Contained in the lifespan of the entity • Modeling using valid time characteristics • How much change an entity can ‘tolerate’ ? • Application dependent issue

  18. Modeling Temporal Relationships • Relationship also has a lifespan • May have changing temporal attributes Mohan works in CSE at various points in time in different roles (student, programmer, faculty) Works relationship with attribute Role and valid time interval Relationship instance validity must be contained in the lifespan of the entities it relates • Temporal characteristics can be associated with inheritance and aggregation relationships also

  19. Exercise ! • Consider student, course, registrations as per the bulletin, etc • Courses are modified • Bulletins change every few years and may overlap • Students repeat courses when they fail • Students change departments, may get credits for a few course, may have a special schedule (==bulletin !)

  20. Summary • Time is ubiquitous • Requirements must be understood clearly • Ad hoc solutions create problems • Specialized representations and query languages simplify time management

More Related