1 / 24

Information Retrieval and Use

Information Retrieval and Use. De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009. Mapping the logical model onto physical design. Entities become tables More often than not! Attributes become fields (columns)

lela
Download Presentation

Information Retrieval and Use

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009

  2. Mapping the logical model onto physical design • Entities become tables • More often than not! • Attributes become fields (columns) • Unique identifiers become primary keys • Relationships implemented by foreign key columns • Resolve M:N relationships by inserting intersection table

  3. Mapping considerations • Independence • Privacy • Efficiency of queries

  4. Denormalisation • Joins take time! • Split or merge normalised entities based on frequent associated use • Remove redundant relationships • Merge entities with 1:1 relationships • Use summary fields • Use summary tables and views

  5. Using summary field(1) • Consider running a query “give the total value of all orders for customer X” How many joins?

  6. Using summary field (2) • Note summary field in Orders table How many joins now?

  7. Distributed database systems • Special rules apply!

  8. The traditional model • One centralised database • Terminals at remote locations • Disadvantages • Networks are slow (esp WANS!) • Central machine does all processing • If central machine fails, database is down (Integrity, redundancy and disaster recovery considered in later lectures!)

  9. The Client/Server model • Client – application – “front end” • Server – DBMS – “back end” • Still dependent on central database

  10. Client responsibilities • Manages user interface • Accepts user data • Has local processing capability within the application • Generates database requests and transmits them via network to server • Receives results from server and formats them as required by application

  11. Server responsibilities • Accepts database requests from client • Processes database requests • Handles security issues • Deals with concurrency issues • Optimizes queries • Handles recovery/rollback issues • Returns results to client

  12. Distributed database architecture • A collection of logically related “sites”, connected together so that the users view is that of a single database at a single location. • Each site is a database in it’s own right • Not necessarily physically or geographically separated, but often are – and are logically separated.

  13. Advantages • Organisations are distributed, why shouldn’t their data be? • Improved efficiency • Store data close to where it’s used

  14. Types of DDS • Homogenous – same type of RDBMS at each site (easy!) • Heterogeneous – different types of DBMS at each site (not so easy!)

  15. Implementation methods (1) • Fragmentation – splitting data between sites • Horizontal – row based – e.g. store all employee records for a location at that location • Vertical – column based – e.g. store all payroll columns in payroll department, all other employee data in HR • Either way, fragments must be able to be put back together!

  16. Implementation methods (2) • Replication • Controlled duplication of data at more than one site • Update propagation?

  17. Objectives (1) • Local autonomy • Local data locally owned and managed – minimal data requirements from remote sites. • No reliance on central site • Continuous operation • Reliability • Availability

  18. Objectives (2) • Location independence • From user’s view, all data is at their site. • Fragmentation independence • Needs joins and unions to put fragments back together • Replication independence

  19. Objectives (3) • Distributed query processing • Distributed transaction management • Transactions carried out by “agents” at distributed sites • Two-phase commit • Locking issues (later lecture)

  20. Objectives (4) • Hardware independence • Operating system independence • Network independence • DBMS independence

  21. DDS issues • Query processing • Optimisation even more important • Catalogue (data dictionary) management • Centralised? • Fully replicated? • Partitioned? • Combination of first and third?

  22. DDS issues • Update propagation • An issue where replication is used. • “Primary copy” system • Recovery • Two-phase commit • Recovery • Locking strategies

  23. Summary • Mapping the logical model • Denormalisation • Traditional database architecture • Client/server model • Distributed Database systems • Advantages • Objectives • Implementation methods • Issues

  24. Further reading • Rolland chapter 10 • Hoffer chapters 12 • Denormalisation - click to follow the link!

More Related