The cloud and databases
1 / 15

The Cloud and databases - PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

The Cloud and databases. Issues. What kind of data management is a good fit with the cloud?. Analytical data management: data attributes Far more reads than writes, so security and privacy less of an issue Tend to have far greater data needs, so there is a need for more servers

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

The Cloud and databases

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

The Cloud and databases


What kind of data management is a good fit with the cloud?

  • Analytical data management: data attributes

    • Far more reads than writes, so security and privacy less of an issue

    • Tend to have far greater data needs, so there is a need for more servers

    • The size of the data set grows over time and does not stabilize, so a better fit with expanding cloud server availability

    • Analytical applications often want data from multiple sources, and availability is much better in a cloud environment

More on analytical processing

  • Analytical Data Managements: system attributes

    • Shared nothing works better when access is mostly reads

    • ACID transactions do not need to be enforced as there is no need for a single, global state for all users

    • Generally, statistical results are okay even if some very secure data is not discovered

What is needed for new generation of cloud dbs?

  • Focus on making use of broad parallelism and on shifting/expanding set of servers

  • Looser notion of fault tolerance, as there is often no need to restart an interrupted query or if a branch of a query is killed

  • Need to be able to operate on data in multiple formats, encryptions, attribute domains, namespaces, schemas, database products – heterogeneity!

  • Must be able to sit underneath business intelligence systems

Hybrid databases: is this the answer?

  • Folks don’t want to learn/buy/program new data management products

  • But folks do want commercial grade systems with professional support

  • Would make the transition from transaction apps to analytical apps easier – like with relational data warehousing

  • But would we end up with an inelligant mess?

What about Object Databases?A return?

  • Blending a host language with a query language makes sense when queries involve complex calculations

  • It is easy to extend an o-o language with statistical procedures

  • The encapsulation of o-o languages is a good match with the wide and independent distribution of data in a cloud environment

  • O-O procedures could be built and deployed by distributed volunteers

Mope on O-O DBS

  • Partial results could be maintained and kept up to date, with batch updating of raw data only infrequently

  • We know how to build multiple language interfaces to accommodate multiple o-o languages

  • O-O databases are a good match with service-based interfaces – see diagram on page 29

Object-oriented dbs: relevant research & dev.

  • Adaptive query processing and optimization in real time

  • Parallel and distributed database technology

  • Massively parallel systems

  • Shared nothing systems

  • Data management stream technology

Problem: most business data right now is in a relational foRMat

  • We don’t have truly massively parallel and distributed query models for relational data

  • We don’t have truly massively parallel and distributed data partitioning for relational data

  • To perform efficient and fluid analytical processing of data in the cloud, we would need to create new links quickly, but we won’t have a focused, fixed schema as we do in standard relational systems

  • Object extensions to relational systems don’t include method encapsulation, only expanded domains

More cloud issues: centralized control?

  • Is the cloud trusted or anonymous?

    • Trusted, provider-specific commercial cloud solutions are much safer, centrally managed, and optimized as a single network, not as a mesh of networks

    • In many environments, even trusted, centralized environments, many machines are not properly managed and are controlled by immediate users

    • People don’t like their machines being co-opted, and so trust is not enough to guarantee dependibility

More on the cloud:Other applications?

  • Is analytical processing the only likely application?

  • There are many data sharing applications

  • There are many applications for selling access to bulk data

  • Data mining is a more focused form of analytical processing, but demands a very precise level of heterogeneity resolution and integration in the case of most medical and financial applications (and others)

Data mining

  • Kinds of data (from Data Mining by Han and Kamber)

    • Relational dbs

    • Data warehouses

    • Transaction processing systems

    • Object-relational dbs

    • Time sequence and temporal dbs

    • Spatial dbs

    • Text dbs

    • Multimedia dbs

    • Legacy dbs

    • Data streams

    • The Web…

heterogeneity in databases: data mining implications

  • Note how broad the “Web” is on the previous slide

    • Includes countless hand-rolled dbs

    • Includes databases hidden by web development frameworks like Ruby on Rails

    • Includes data accessible only via specific APIs

    • Includes data accessible via XML and Xpath, Xquery technology

    • Includes data stored in proprietary databases for applications like CAD, finance, animation, geography

  • The heterogeneity problem will only be solved by widespread collaboration on unifying standards

More on the Cloud: the future of transaction processing?

  • Will the rigidly centralized notion of OLTP survive?

    • Corporations are adapting to the cloud incrementally and using middleware to leverage their own clouds

    • With global business comes global data processing, across time zones, and is often managed in a widely distributed fashion

    • There are large corporations that handle financial and retail transactions for other companies

    • Are people warming to the idea of managing their personal and small business data in the cloud, including document and other services?

But the cloud is process-centric and not data-centric

  • Is the process vs. data centric issued about to reawaken?

    • The process folks kind of lost…

    • Data is seen more and more as a valuable resource, even if it is only “sold” indirectly

    • More of us are buying multimedia data

  • There are actually 3 models, process and data centric, and encapsulated

    • Some argue that the cloud is actually an encapsulated model and that in fact, data movement is difficult to optimize do to the dynamic nature of the network

    • Object-oriented databases…?

  • Login