Windows azure tables and queues deep dive
This presentation is the property of its rightful owner.
Sponsored Links
1 / 52

Windows Azure Tables and Queues Deep Dive PowerPoint PPT Presentation


  • 147 Views
  • Uploaded on
  • Presentation posted in: General

SVC09. Windows Azure Tables and Queues Deep Dive. Jai Haridas Software Design Engineer Microsoft Corporation. Agenda. Overview of Windows Azure Tables Patterns and Practices for Windows Azure Tables Overview of Windows Azure Queues Patterns and Practices for Windows Azure Queues

Download Presentation

Windows Azure Tables and Queues Deep Dive

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Windows azure tables and queues deep dive

SVC09

Windows Azure Tables and Queues Deep Dive

Jai Haridas

Software Design Engineer

Microsoft Corporation


Agenda

Agenda

  • Overview of Windows Azure Tables

  • Patterns and Practices for Windows Azure Tables

  • Overview of Windows Azure Queues

  • Patterns and Practices for Windows Azure Queues

  • Q&A

2


Fundamental storage abstractions

Fundamental Storage Abstractions

  • Tables– Provide structured storage. A Table is a set of entities, which contain a set of properties

  • Queues– Provide reliable storage and delivery of messages for an application

  • Blobs – Provide a simple interface for storing named files along with metadata for the file

  • Drives – Provides durable NTFS volumes for Windows Azure applications to use (new)

3


Windows azure tables

Windows Azure Tables

  • Provides Structured Storage

    • Massively Scalable Tables

      • Billions of entities (rows) and TBs of data

      • Can use thousands of servers as traffic grows

    • Highly Available & Durable

      • Data is replicated several times

  • Familiar and Easy to use API

    • ADO.NET Data Services – .NET 3.5 SP1

      • .NET classes and LINQ

      • REST – with any platform or language

4


Table storage concepts

Table Storage Concepts

Accounts

Tables

Entities

Email =…

Name = …

Users

Email =…

Name = …

moviesonline

Genre =…

Title = …

Movies

Genre =…

Title = …

5


Table data model

Table Data Model

  • Table

    • A storage account can create many tables

    • Table name is scoped by account

    • Set of entities (i.e. rows)

  • Entity

    • Set of properties (columns)

    • Required properties

      • PartitionKey, RowKey and Timestamp

6


Required entity properties

Required Entity Properties

  • PartitionKey & RowKey

    • Uniquely identifies an entity

    • Defines the sort order

    • Use them to scale your application

  • Timestamp

    • Read only

    • Optimistic Concurrency

7


Partitionkey and partitions

PartitionKey And Partitions

  • PartitionKey

    • Used to group entities in the table into partitions

  • A table partition

    • All entities with same partition key value

    • Unit of scale

    • Control entity locality

    • Row key provides uniqueness within a partition

8


Partitions and partition ranges

Partitions and Partition Ranges

Server A

Table = Movies

[Action - Comedy)

Server A

Table = Movies

Server B

Table = Movies

[Comedy- Western)

9


Table operations

Table Operations

  • Table

    • Create

    • Query

    • Delete

  • Entities

    • Insert

    • Update

      • Merge – Partial Update

      • Replace – Update entire entity

    • Delete

    • Query

    • Entity Group Transaction (new)


Table schema

Table Schema

Define the schema as a .NET class

[DataServiceKey("PartitionKey", "RowKey")]

publicclassMovie

{

///<summary>

/// Category is the partition key

///</summary>

publicstringPartitionKey { get; set; }

///<summary>

/// Title is the row key

///</summary>

publicstringRowKey { get; set; }

publicDateTime Timestamp { get; set; }

publicintReleaseYear { get; set; }

publicstring Language { get; set; }

publicstring Cast { get; set; }

}

11


Table sdk sample code

Table SDK Sample Code

StorageCredentialsAccountAndKeycredentials = newStorageCredentialsAccountAndKey(

“myaccount", “myKey");

stringbaseUri= "http://myaccount.table.core.windows.net";

CloudTableClienttableClient = newCloudTableClient(baseUri, credentials);

tableClient.CreateTable(“Movies");

TableServiceContextcontext = tableClient.GetDataServiceContext();

CloudTableQuery<Movie> q = (from movie incontext.CreateQuery<Movie>(“Movies")

wheremovie.PartitionKey == “Action" && movie.RowKey== "The Bourne Ultimatum"

selectmovie).AsTableServiceQuery<Movie>();

MoviemovieToUpdate = q.FirstOrDefault();

// Update movie

context.UpdateObject(movieToUpdate);

context.SaveChangesWithRetries();

//Add movie

context.AddObject(new Movie(“Action" , movieToAdd));

context.SaveChangesWithRetries();

12


Agenda1

Agenda

  • Overview of Windows Azure Tables

  • Patterns and Practices for Windows Azure Tables

  • Overview of Windows Azure Queues

  • Patterns and Practices for Windows Azure Queues

  • Q & A

13


Key selection things to consider

Key Selection: Things to Consider

  • Scalability

    • Distribute load as much as possible

    • Hot partitions can be load balanced

    • PartitionKeyis critical for scalability

  • Query Efficiency & Speed

    • Avoid frequent large scans

    • Parallelize queries

  • Entity group transactions (new)

    • Transactions across a single partition

    • Transaction semantics & Reduce round trips

14


Key selection case study 1

Key Selection: Case Study 1

  • Table for listing all movies

    • Home page lists movies based on chosen category

15


Movie listing solution 1

Movie Listing – Solution 1

  • Why do I need multiple PartitionKeys?

    • Account name as Partition Key

    • Movie title as RowKey since movie names need to be sorted

    • Category as a separate property

  • Does this scale?

16


Movie listing solution 11

Movie Listing – Solution 1

  • Single partition - Entire table served by one server

  • All requests served by that single server

  • Does not scale

Client

Client

Request

Request

Request

Request

Server A

17


Movie listing solution 2

Movie Listing – Solution 2

  • All movies partitioned by category

  • Allows system to load balance hot partitions

  • Load distributed

  • Better than single partition

Server A

Client

Client

Request

Request

Request

Request

Request

Request

Request

Request

Server B

18


Key selection case study 2

Key Selection: Case Study 2

  • Log every transaction into a table for diagnostics

    • Scale Write Intensive Scenario

    • Logs can be retrieved for a given time range

19


Logging solution 1

Logging - Solution 1

  • Timestamp as Partition Key

    • Looks like an obvious choice

    • It is not a single partition as time moves forward

    • Append only

    • Requests to single partition range

    • Load balancingdoesnot help

    • Server may throttle

Server A

Applications

Client

Server B

Request

Request

Request

Request

20


Logging solution 2 distribute append only

Logging Solution 2 - Distribute "Append Only”

  • Prefix timestamp such that load is distributed

    • Id of the node logging

    • Hash into N buckets

  • Write load is now distributed

  • Better throughput

  • To query logs in time range

    • Parallelize it across prefix values

Server A

Applications

Client

Server B

Request

Request

Request

Request

21


Key selection query efficiency speed

Key Selection: Query Efficiency & Speed

  • Select keys that allow fast retrieval

  • Reduce scan range

  • Reduce scan frequency

22


Windows azure tables and queues deep dive

Single Entity Query

  • Where PartitionKey=‘SciFi’ and RowKey = ‘Star Trek’

  • Efficient processing

  • No continuation tokens

Server A

Client

Request

Server B

Result

23


Windows azure tables and queues deep dive

Table Scan Query

  • Select * from Movies where Rating > 4

  • Returns Continuation token

    • 1000 movies in result set

    • Partition range boundary

  • Serial Processing: Wait for

    continuation token before

    proceeding

Returns 1000 movies

Partition range boundary hit

Server A

Cont.

Cont.

Return continuation

Client

Request

Request Cont.

Request Cont.

Server B

Cont.

24


Windows azure tables and queues deep dive

Make Scans Faster

  • Split “Select * from Movies where Rating > 4” into

    • Where PartitionKey >= “A” and PartitionKey < “D” and Rating > 4

    • Where PartitionKey >= “D” and PartitionKey < “I” and Rating > 4

    • Etc.

  • Execute in parallel

  • Each query handles continuation

Server A

Cont.

Cont.

Request

Client

Request

Request

Server B

Cont.

25


Query speed

Query Speed

  • Fast

    • Single PartitionKey and RowKey with equality

  • Medium

    • Single partition but a small range for RowKey

    • Entire partition or table that is small

  • Slow

    • Large single scan

    • Large table scan

    • “OR” predicates on keys => no query optimization => results in scan

  • Expect continuation token for all except in 1

  • 26


    Make queries faster

    Make Queries Faster

    • Large Scans

      • Split the range and parallelize queries

      • Create and maintain own views that help queries

    • “Or” Predicates

      • Execute individual query in parallel instead of using “OR”

    • User Interactive

      • Cache the result to reduce scan frequency

    27


    Windows azure tables and queues deep dive

    Expect Continuation Tokens – Seriously!

    • Maximum of 1000 rows in a response

    • At the end of partition range boundary

    • Maximum of 5 seconds to execute the query

    28


    Entity group transactions egt new

    Entity Group Transactions (EGT) (new)

    • Atomically perform multiple insert/update/deleteover entities in same partition in a single transaction

    • Maximum of 100 commands in a single transaction and payload < 4 MB

    • ADO.Net Data Service

      • Use SaveChangesOptions.Batch

    29


    Key selection entity group transaction

    Key Selection: Entity Group Transaction

    • Case Study

      • Maintain user account information

        • Account ID, User Name, Address, Number of rentals

      • Maintain information of checked out rentals

        • Account ID, Movie Title, Check out date, Due date

    • Solution 1 – Maintain two tables – Users & Rentals

      • Handle Cross table consistency

        • Insert into Rentals table succeeds

        • Update to Users table fails

        • Queue to maintain consistency

    30


    Solution 2

    Solution 2

    • Store Account Information and Rental details in same table

      • Maintain same PartitionKey to enforce transactions

        • Account ID as PartitionKey

      • Update total count and Insert new rentals using Entity Group Transaction

      • Prefix RowKey with “Kind” code: A = Account, R = Rental

        • Row key for account info: [Kind Code]_[AccountId]

        • Row Key for rental info: [Kind Code]_[Title]

      • Rental Properties not set for Account row and vice versa

    31


    Best practices summary

    Best Practices & Summary

    • Select PartitionKey and RowKey that help scale

      • Efficient for frequently used queries

      • Supports batch transactions

      • Distributes load

    • Distribute “Append only” patterns using prefix to PartitionKey

    • Always Handle continuation tokens

    • Client can maintain their own cache/views instead of frequent scans

      • Future Feature - Secondary Index

    • Execute parallel queries instead of “OR” predicates

    • Implement back-off strategy for retries

    32


    Agenda2

    Agenda

    • Overview of Windows Azure Tables

    • Patterns and Practices for Windows Azure Tables

    • Overview of Windows Azure Queues

    • Patterns and Practices for Windows Azure Queues

    • Q & A

    33


    Windows azure queues

    Windows Azure Queues

    • Queue are performance efficient, highly available and provide reliable message delivery

      • Simple, asynchronous work dispatch

      • Programming semantics ensure that a message can be processed at least once

    • Access is provided via REST

    34


    Queue storage concepts

    Queue Storage Concepts

    Accounts

    Queues

    Messages

    128 x 128 http://...

    thumbnailjobs

    256 x 256 http://...

    sally

    http://...

    traverselinks

    http://...

    35


    Account queues and messages

    Account, Queues and Messages

    • An account can create many queues

      • Queue Name is scoped by the account

    • A Queue contains messages

      • No limit on number of messages stored in a queue

      • Set a limit for message expiration

    • Messages

      • Message size <= 8 KB

      • To store larger data, store data in blob/entity storage, and the blob/entity name in the message

      • Message now has dequeue count

    36


    Queue operations

    Queue Operations

    • Queue

      • Create Queue

      • Delete Queue

      • List Queues

      • Get/Set Queue Metadata

    • Messages

      • Add Message (i.e. Enqueue Message)

      • Get Message(s) (i.e. Dequeue Message)

      • Peek Message(s)

      • Delete Message

    37


    Queue programming api

    Queue Programming Api

    CloudQueueClientqueueClient = newCloudQueueClient(baseUri, credentials);

    CloudQueuequeue = queueClient.GetQueueReference("test1");

    queue.CreateIfNotExist();

    //MessageCountis populated via FetchAttributes

    queue.FetchAttributes();

    CloudQueueMessagemessage = newCloudQueueMessage("Some content");

    queue.AddMessage(message);

    message = queue.GetMessage(TimeSpan.FromMinutes(10) /*visibility timeout*/);

    //Process the message here …

    queue.DeleteMessage(message);

    38


    Agenda3

    Agenda

    • Overview of Windows Azure Tables

    • Patterns and Practices for Windows Azure Tables

    • Overview of Windows Azure Queues

    • Patterns and Practices for Windows Azure Queues

    • Q & A

    39


    Removing poison messages

    Removing Poison Messages

    Producers

    Consumers

    C1

    P2

    1. GetMessage(Q, 30 s)  msg 1

    4

    0

    3

    0

    3

    2

    0

    2

    1

    2

    1

    2

    1

    1

    1

    1

    1

    1

    1

    1

    0

    C2

    P1

    2. GetMessage(Q, 30 s)  msg 2

    40


    Removing poison messages1

    Removing Poison Messages

    Producers

    Consumers

    1

    1

    C1

    P2

    1. GetMessage(Q, 30 s)  msg 1

    5. C1 crashed

    4

    0

    3

    0

    2

    1

    1

    1

    1

    2

    1

    2

    1

    1

    3

    6. msg1 visible 30 s after Dequeue

    2

    1

    C2

    P1

    2. GetMessage(Q, 30 s)  msg 2

    3. C2 consumed msg 2

    4. DeleteMessage(Q, msg 2)

    7. GetMessage(Q, 30 s)  msg 1

    41


    Removing poison messages2

    Removing Poison Messages

    Producers

    Consumers

    1. Dequeue(Q, 30 sec)  msg 1

    5. C1 crashed

    10. C1 restarted

    11. Dequeue(Q, 30 sec)  msg 1

    12. DequeueCount > 2

    13. Delete (Q, msg1)

    C1

    P2

    4

    0

    3

    0

    1

    2

    1

    3

    1

    2

    1

    3

    3

    1

    2

    C2

    P1

    6. msg1 visible 30s after Dequeue

    9. msg1 visible 30s after Dequeue

    2. Dequeue(Q, 30 sec)  msg 2

    3. C2 consumed msg 2

    4. Delete(Q, msg 2)

    7. Dequeue(Q, 30 sec)  msg1

    8. C2 crashed

    42


    Best practices summary1

    Best Practices & Summary

    • Make message processing idempotent

      • No need to deal with failures

    • Do not rely on order

      • Invisible messages result in out of order

    • Use Dequeue count to remove poison messages

      • Enforce threshold on message’s dequeue count

    • Use message count to dynamically increase/reduce workers

    • Use blob to store message data with reference in message

      • Messages > 8KB

      • Batch messages

      • Garbage collect orphaned blobs

    43


    Future features

    Future Features

    • Allow workers to extend invisibility time

      • Time to process message unknown at dequeue time

      • Worker can extend the time as needed

    • Allow longer invisibility time

      • Long running work items may need more than 2 hours

    • Allow messages to not expire

      • Large backlogs will not cause messages to expire

    44


    Takeaways

    Takeaways

    • Table

      • Scalable & Reliable Structured Storage System

      • Partitioning is critical to scalability

      • Entity Group Transactions (new)

    • Queue

      • Scalable & Reliable Messaging System

      • Dequeue count returned with message (new)

    • Use back-off strategy on retries

    • Official Storage Client Library (new)

    45


    Windows azure tables and queues deep dive

    Windows Azure Session Alerts!!

    • Storing and Manipulating Blobs and Files with Windows Azure Storage – 11/18 (4:30 PM)

    • Patterns for building Reliable & Scalable Applications with Windows Azure – 11/19 (8:30 AM)

    • Automating the Application Lifecycle with Windows Azure – 11/19 (10:00 AM)


    Windows azure tables and queues deep dive

    Q&A


    Windows azure pdc swag

    Windows Azure PDC Swag


    Windows azure tables and queues deep dive

    YOUR FEEDBACK IS IMPORTANT TO US!

    Please fill out session evaluation forms online at

    MicrosoftPDC.com


    Learn more on channel 9

    Learn More On Channel 9

    • Expand your PDC experience through Channel 9

    • Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses

      channel9.msdn.com/learn

    Built by Developers for Developers….


  • Login