1 / 0

F28DM: DATABASE MANAGEMENT SYSTEMS Recent Database Development

F28DM: DATABASE MANAGEMENT SYSTEMS Recent Database Development. Database Evolution. Up to 2000ish. Flat file with delimeters. Hierarchical databases. Relational Databases. Some of the reasons for dominance:. Simplicity of relational model, solid theoretical basis and normalization rules .

zulema
Download Presentation

F28DM: DATABASE MANAGEMENT SYSTEMS Recent Database Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. F28DM: DATABASEMANAGEMENT SYSTEMSRecent Database Development

    Recent Developments
  2. Database Evolution Up to 2000ish Recent Developments
  3. Flat file with delimeters Recent Developments
  4. Hierarchical databases Recent Database Developments
  5. Relational Databases Recent Database Developments
  6. Some of the reasons for dominance: Simplicity of relational model, solid theoretical basis and normalization rules. Easy data manipulation, simple and very powerful SQL language. Widely used and understood Strong for transaction processing Concurrency control and ACID properties. High level of standardization, standardized API’s. They provide an integration mechanism (i.e. multiple applications can access the same data source). Recent Developments
  7. ACID - Atomicity The entire transaction is treated as a single, indivisible unit of work which must be performed completely or not at all. For example, if a transaction is updating 100 rows and fails after 20, the database must roll back the changes to those 20 rows. Recent Developments
  8. ACID - Consistency Only valid data is written to the database. A successful transaction takes the database from one state that is consistent with the rules toanother state that is also consistent with the rules. For example, any change to a department number in the Department table (the primary key) must also update all associated department numbers in related tables (foreign keys). Recent Developments
  9. ACID - Isolation Multiple transactions occurring at the same time must not interfere with each other. Data used within a transaction cannot be used by another transaction until the first transaction is completed. (or it must appear that this happened!). Relational Model
  10. ACID - Durability Once the transaction changes have been made, they will survive failure (the recovery system must ensure this). Recent Developments
  11. Extracting data for Business Intelligence ETL = extract, transform, load Data mart = small local datawarehouse Operational Data Store = recent data en route to data warehouse OLAP Online Analytical Processing Recent Database Developments
  12. Summary from Stonebraker “Historically, Online Transaction Processing (OLTP) was performed by customers submitting traditional transactions (order something, withdraw money, cash a check, etc.) to a relational DBMS. Large enterprises might have dozens to hundreds of these systems. Recent Developments
  13. Summary from Stonebraker - 2 Invariably, enterprises wanted to consolidate the information in these OLTP systems for business analysis, cross selling, or some other purpose. Hence, Extract-Transform-and-Load (ETL) products were used to convert OLTP data to a common format and load it into a data warehouse. Recent Developments
  14. Summary from Stonebraker - 3 Data warehouse activity rarely shared machine resources with OLTP because of lock contention in the DBMS and because business intelligence (BI) queries were so resource-heavy that they got in the way of timely responses to transactions. This combination of a collection of OLTP systems, connected to ETL, and connected to one or more data warehouses is the gold standard in enterprise computing.” Recent Developments
  15. Object Oriented Databases Recent Database Developments
  16. XML Databases Recent Database Developments
  17. Newer web-intensive applications From 2000ish Recent Developments
  18. Big Data explosion ‘Big Data’ represents the explosive growth in online data. This has outpaced the increases in CPU processing power, memory and storage capacity in the last few years It has outgrown the processing capacity of a single relational database Relational Model
  19. Relational Model
  20. Relational Model
  21. Where does big data come from? Web searching Multi-player online games Online message systems like Twitter Social network analysis Photo processing Media streaming Sensor networks Enterprise data no longer just data entry Point of sale devices, GPS, customer info Satellite images Relational Model
  22. Summary - 1 Consider new Web-based applications such as multi-player games, social networking sites, and online gambling networks. The aggregate number of interactions per second is skyrocketing for the successful Web properties in this category. Relational Model
  23. Summary - 2 In addition, the explosive growth of smartphones has created a market for applications that use the phone as a geographic sensor and provide location-based services. Again, successful applications are seeing explosive growth in transaction requirements. Relational Model
  24. Summary - 3 Hence, the Web and smartphones are driving the volume of interactions with a DBMS through the roof, and New OLTP developers need vastly better DBMS performance and enhanced scalability. Relational Model
  25. 3Vs – Big Volume The application consumes terabytes (TB) of data, even petabytes (PB).   For example, website visitor traffic, which can quickly grow to Petabyte-scale, is increasingly analysed by website owners to help them learn about visitor patterns, reactions to promotional offers, and seasonal behaviours. Recent Developments
  26. 3Vs – Big Velocity An application has so much data, moving so fast, that it’s like drinking from a fire-hose.     For example, an internet service provider that samples hundreds of thousands of messages from its routers, in real time, to recognize and mitigate large-scale denial-of-service (DoS) attacks. Recent Developments
  27. 3Vs – Big Variety The application needs to integrate data from a large variety of data sources.    For example, a social networking website that needs to efficiently store and retrieve social graphs for its members, each of which can have thousands of endpoints . Recent Developments
  28. Improving performance SOME solutions Recent Developments
  29. Scalability Scalability is the ability of a system to accommodate a growing amount of work. scale up/scale vertically — buying bigger servers as database load increases there is a limit scale out/scale horizontally — distributing the database across multiple hosts as load increases. More flexible, more complex Recent Developments
  30. Partitioning Partitioning the database improves performance and scalability. Partitioning the data by functionality e.g. all about Users on one server, about Products on another etc. Foreign keys can be used to link the data, but this means that these links span servers. ‘Sharding’ involves partitioning the data across functionality (maybe by location) e.g. data for Scottish users and the Scottish product warehouse on one server, for Welsh users and the Welsh product warehouse on another. Recent Developments
  31. Sharding Sharding breaks the database into smaller ones, more manageable and responsive. Most queries work within the same database. Queries on different shards need to be consolidated – we don’t want to have to do this often. For reliability, need more than one live copy, so need a replication mechanism. Consistency across copies needs to be considered. Recent Developments
  32. Facebook In 2011, Facebook split its MySQL database into 4,000 shards in order to handle the site’s massive data volume, and is running 9,000 instances of memcached in order to keep up with the number of transactions the database must serve. Recent Developments
  33. Memory caching Speeds up access to frequently accessed data A memcached architectureis a distributed memory caching system using a giant hash table across multiple machines. It’s been used from 2003, by lots of sites such as YouTube, Twitter, Facebook. It uses a key/value associative array. When memory runs out, the oldest values are discarded. Recent Developments
  34. In-memory databases - 1 All (DBMS) try to keep portions of a database resident in random access memory (RAM), to reduce IO transfers and enable quicker queries on frequently or recently accessed data. Now there is a resurgence of interest in total in-memory databases. Recent Developments
  35. In-memory databases - 2 There are various ways of ensuring persistence, such as : transaction logging, which writes periodic snapshots of the database to disk; data replication, which keeps more than one copy of the data; and using non-volatile RAM which maintain memory even when the computer is switched off. Recent Developments
  36. the CAP theorem - 1 In 2002, by Eric Brewster said that as applications become more web-based we should stop worrying about data consistency, because if we want high availability, then guaranteedconsistency of data is something we cannot have. Recent Developments
  37. the CAP theorem - 2 A distributed computer system cannot simultaneously provide all three of the following guarantees: Consistency all nodes see the same data at the same time the client perceives that a set of operations has occurred all at once Availability node failures do not prevent survivors from continuing to operate every operation must terminate in an intended response Partition tolerance the system continues to operate despite arbitrary message loss operations will complete even if individual components are unavailable (a few nodes can fail but the system will keep going) Recent Developments
  38. New partitioned database architectures moved towards non-relational databases, and sacrificed consistency to get high availability, scalability and partition tolerance. These new architectures do NOT support ACID, but BASE (Basically Available, Soft state, Eventually consistent). This is acceptable in many cases.For example, if users are mostly updating their own data, or adding data. Also, the time that humans spend looking at a screen means that we could all be looking at ‘stale’ data. Recent Developments
  39. NoSQL databases Recent Developments
  40. NoSQL Definition Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of dataand more. "nosql" ("not only sql") should be seen as an alias to something like the definition above.
  41. Implementations… Various NoSQL implementations differ considerably on Consistency Data types Data structure and how stored Language for access
  42. Types of NoSQL System Key/value store Schema-less. Minimal datatype support. Document-oriented data stores Highly structured key/value stores often using JSON. Object oriented XML based
  43. Columnar Table-oriented data stores Like key/value but defines value as a set of columns E.g. Google’s big table Like key/value but defines value as a set of columns It can be very fast to search or sum a column. They work particularly well in analytic workloads which are dominated by numeric values and dimensions such as the employee-status attribute
  44. Types of NoSQL System Graph Databases Uses graph structure with nodes and edges and properties. Relational Model
  45. NewSQL new database products which support SQL and the ACID notion for transactions, whilst providing high performance and scalability. designed to bring the benefits of the relational model to distributed architectures, or to improve the performance of relational databases to the extent that horizontal scalability is no longer a necessity. (e.g. in memory databases) Recent Developments
  46. Evolving database landscape Recent Database Developments
  47. ‘SPRAIN’ six key factors SPRAIN refers to the driving the adoption of alternative data management to traditional relational databases that are being ‘sprained’ as a result of being s-t-r-e-t-c-h-e-d beyond their normal capacity by the needs of high-volume highly distributed highly complex applications. Recent Developments
  48. ‘SPRAIN’ drivers Those six key drivers, and their associated sub-drivers, are as follows: Scalability – hardware economics Performance – SQL limitations Relaxed consistency – CAP theorem Agility – polyglot persistence Intricacy – big data, total data Necessity – open source Recent Developments
  49. What next RDBMS enabled the conversion of business data into information, driving business processes and business intelligence NoSQLdbs must not just support high availability, scalability, low latency needs Also need to provide state-of-the art analytics to power business intelligence Be elastic to fit cloud environment Relational Model
  50. Polyglot Persistence involves businesses using many different data storage technologies, choosing the most appropriate one for each task. Read the article by Martin Fowler on this Recent Developments
  51. BUT Most applications should stick with the relational model untilNoSQL databases aremore mature. Recent Developments
More Related