300 likes | 440 Views
Agenda. Recap of Azure Cloud ServicesRecap of MapReduceAzure MapReduce ArchitecturePairwise distance alignment implementationNext steps. Cloud Computing. On demand computational services over webBacked by massive commercial infrastructures giving economies of scaleSpiky compute needs of the sc
E N D
1. Azure MapReduce Thilina Gunarathne
Salsa group, Indiana University
2. Agenda Recap of Azure Cloud Services
Recap of MapReduce
Azure MapReduce Architecture
Pairwise distance alignment implementation
Next steps AzureMapReduce is a distributed decentralized MapReduce runtime for Windows Azure developed using Azure cloud infrastructure services.
AzureMapReduce is a distributed decentralized MapReduce runtime for Windows Azure developed using Azure cloud infrastructure services.
3. Cloud Computing On demand computational services over web
Backed by massive commercial infrastructures giving economies of scale
Spiky compute needs of the scientists
Horizontal scaling with no additional cost
Increased throughput
Cloud infrastructure services
Storage, messaging, tabular storage
Cloud oriented services guarantees
Virtually unlimited scalability
Future seems to be CLOUDY!!!
Horizontal scaling increase throughput – 100 hours in 10 nodes == 10 hours in 100 nodes
Virtually Unlimited resource availability
Horizontal scaling increase throughput – 100 hours in 10 nodes == 10 hours in 100 nodes
Virtually Unlimited resource availability
4. Azure Platform Windows Azure Compute
.net platform as a service
Worker roles & web roles
Azure Storage
Blobs
Queues
Table
Development SDK, fabric and storage .net and REST interfaces.
Blob containers and blobs.. Page vs block.
Queus ? Provides loose synchronization, Any number of messages, one week of persistence, Maximum size 8KB, Visibility timeout
Table -> structured storage, tables and entities (set of properties)..
.net and REST interfaces.
Blob containers and blobs.. Page vs block.
Queus ? Provides loose synchronization, Any number of messages, one week of persistence, Maximum size 8KB, Visibility timeout
Table -> structured storage, tables and entities (set of properties)..
5. MapReduce Automatic parallelization & distribution
Fault-tolerant
Provides status and monitoring tools
Clean abstraction for programmers
map (in_key, in_value) ->
(out_key, intermediate_value) list
reduce (out_key, intermediate_value list) ->
out_value list
6. Motivation Currently no parallel programming framework on Azure
No MPI, No Dryad
Well known, easy to use programming model
Cloud nodes are not as reliable as conventional cluster nodes
Need for a framework which handles the reliability issues.Need for a framework which handles the reliability issues.
7. Azure MapReduce Concepts Take advantage of the cloud services
Distributed services, Unlimited scalability
Backed by industrial strength data centers and technologies
Decentralized control
Dynamically scale up/down
Eventual consistency
Large latencies
Coarser grained map tasks
Global queue based scheduling