1 / 11

Large dataset processing in the Cloud

Large dataset processing in the Cloud. Kevin Glenny and GridwiseTech team. Simplified data oriented system. applications working on data. Internal or external data sources. IT systems are constantly growing. Increased number of users. Increased number of applications. Increased amount

claire
Download Presentation

Large dataset processing in the Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team

  2. Simplified data oriented system applications working on data Internal or external data sources

  3. IT systems are constantly growing Increased number of users Increased number of applications Increased amount of data

  4. IT systems are constantly growing Infrastructure bottleneck

  5. Example • Electronics manufacturer • 24/7 production • Report computation too long for decision making • 2.5 million transactions daily • 4TB data to manage

  6. What is Cloud computing? • „Transparant access to capabilities using a pay-per-use business model” • Benefits: • Dynamic scaling • Pay-for-use • Off-shored administration

  7. What are the delivery models? • SaaS (Software as a Service) • SalesForce.com, 63,00 clients PaaS (Platform as a Service) • Google App Engine (2008), Microsoft Azure (2008) IaaS (Infrastructure as a Service) • Amazon Elastic Compute Cloud, 8.2 million instances launched since 2006

  8. Application data processing • Database sharding (MySQL, postgreSQL etc.) • NoSQL (Google's BigTable, Amazon's Dynamo etc.) • Data-grid (GigaSpaces XAP, Oracle Coherance, InfiniSpan etc.)

  9. Data-grid and sharding in the Cloud • Achievements: • Near real-time • Dynamic scaling (application • and resources) • Pay-per-use • Reduced administration • HA All data processing and persistence in the Cloud

  10. Remaining issues • Getting large datasets in and out of the Cloud • Bandwidth limited client side • Resort to mailing hard drives! • Performance - 2 to 50% slow down • Data security/privacy - trust • SLAs – plan for the worst

  11. Conclusions • Data oriented systems datasets grow causing bottlenecks • Datasets in the Cloud can be processed using scalable technologies • Challenges remain • Main – how to get the data to the Cloud?

More Related