Windows azure tables and queues deep dive
Sponsored Links
This presentation is the property of its rightful owner.
1 / 52

Windows Azure Tables and Queues Deep Dive PowerPoint PPT Presentation


  • 166 Views
  • Uploaded on
  • Presentation posted in: General

SVC09. Windows Azure Tables and Queues Deep Dive. Jai Haridas Software Design Engineer Microsoft Corporation. Agenda. Overview of Windows Azure Tables Patterns and Practices for Windows Azure Tables Overview of Windows Azure Queues Patterns and Practices for Windows Azure Queues

Download Presentation

Windows Azure Tables and Queues Deep Dive

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


SVC09

Windows Azure Tables and Queues Deep Dive

Jai Haridas

Software Design Engineer

Microsoft Corporation


Agenda

  • Overview of Windows Azure Tables

  • Patterns and Practices for Windows Azure Tables

  • Overview of Windows Azure Queues

  • Patterns and Practices for Windows Azure Queues

  • Q&A

2


Fundamental Storage Abstractions

  • Tables– Provide structured storage. A Table is a set of entities, which contain a set of properties

  • Queues– Provide reliable storage and delivery of messages for an application

  • Blobs – Provide a simple interface for storing named files along with metadata for the file

  • Drives – Provides durable NTFS volumes for Windows Azure applications to use (new)

3


Windows Azure Tables

  • Provides Structured Storage

    • Massively Scalable Tables

      • Billions of entities (rows) and TBs of data

      • Can use thousands of servers as traffic grows

    • Highly Available & Durable

      • Data is replicated several times

  • Familiar and Easy to use API

    • ADO.NET Data Services – .NET 3.5 SP1

      • .NET classes and LINQ

      • REST – with any platform or language

4


Table Storage Concepts

Accounts

Tables

Entities

Email =…

Name = …

Users

Email =…

Name = …

moviesonline

Genre =…

Title = …

Movies

Genre =…

Title = …

5


Table Data Model

  • Table

    • A storage account can create many tables

    • Table name is scoped by account

    • Set of entities (i.e. rows)

  • Entity

    • Set of properties (columns)

    • Required properties

      • PartitionKey, RowKey and Timestamp

6


Required Entity Properties

  • PartitionKey & RowKey

    • Uniquely identifies an entity

    • Defines the sort order

    • Use them to scale your application

  • Timestamp

    • Read only

    • Optimistic Concurrency

7


PartitionKey And Partitions

  • PartitionKey

    • Used to group entities in the table into partitions

  • A table partition

    • All entities with same partition key value

    • Unit of scale

    • Control entity locality

    • Row key provides uniqueness within a partition

8


Partitions and Partition Ranges

Server A

Table = Movies

[Action - Comedy)

Server A

Table = Movies

Server B

Table = Movies

[Comedy- Western)

9


Table Operations

  • Table

    • Create

    • Query

    • Delete

  • Entities

    • Insert

    • Update

      • Merge – Partial Update

      • Replace – Update entire entity

    • Delete

    • Query

    • Entity Group Transaction (new)


Table Schema

Define the schema as a .NET class

[DataServiceKey("PartitionKey", "RowKey")]

publicclassMovie

{

///<summary>

/// Category is the partition key

///</summary>

publicstringPartitionKey { get; set; }

///<summary>

/// Title is the row key

///</summary>

publicstringRowKey { get; set; }

publicDateTime Timestamp { get; set; }

publicintReleaseYear { get; set; }

publicstring Language { get; set; }

publicstring Cast { get; set; }

}

11


Table SDK Sample Code

StorageCredentialsAccountAndKeycredentials = newStorageCredentialsAccountAndKey(

“myaccount", “myKey");

stringbaseUri= "http://myaccount.table.core.windows.net";

CloudTableClienttableClient = newCloudTableClient(baseUri, credentials);

tableClient.CreateTable(“Movies");

TableServiceContextcontext = tableClient.GetDataServiceContext();

CloudTableQuery<Movie> q = (from movie incontext.CreateQuery<Movie>(“Movies")

wheremovie.PartitionKey == “Action" && movie.RowKey== "The Bourne Ultimatum"

selectmovie).AsTableServiceQuery<Movie>();

MoviemovieToUpdate = q.FirstOrDefault();

// Update movie

context.UpdateObject(movieToUpdate);

context.SaveChangesWithRetries();

//Add movie

context.AddObject(new Movie(“Action" , movieToAdd));

context.SaveChangesWithRetries();

12


Agenda

  • Overview of Windows Azure Tables

  • Patterns and Practices for Windows Azure Tables

  • Overview of Windows Azure Queues

  • Patterns and Practices for Windows Azure Queues

  • Q & A

13


Key Selection: Things to Consider

  • Scalability

    • Distribute load as much as possible

    • Hot partitions can be load balanced

    • PartitionKeyis critical for scalability

  • Query Efficiency & Speed

    • Avoid frequent large scans

    • Parallelize queries

  • Entity group transactions (new)

    • Transactions across a single partition

    • Transaction semantics & Reduce round trips

14


Key Selection: Case Study 1

  • Table for listing all movies

    • Home page lists movies based on chosen category

15


Movie Listing – Solution 1

  • Why do I need multiple PartitionKeys?

    • Account name as Partition Key

    • Movie title as RowKey since movie names need to be sorted

    • Category as a separate property

  • Does this scale?

16


Movie Listing – Solution 1

  • Single partition - Entire table served by one server

  • All requests served by that single server

  • Does not scale

Client

Client

Request

Request

Request

Request

Server A

17


Movie Listing – Solution 2

  • All movies partitioned by category

  • Allows system to load balance hot partitions

  • Load distributed

  • Better than single partition

Server A

Client

Client

Request

Request

Request

Request

Request

Request

Request

Request

Server B

18


Key Selection: Case Study 2

  • Log every transaction into a table for diagnostics

    • Scale Write Intensive Scenario

    • Logs can be retrieved for a given time range

19


Logging - Solution 1

  • Timestamp as Partition Key

    • Looks like an obvious choice

    • It is not a single partition as time moves forward

    • Append only

    • Requests to single partition range

    • Load balancingdoesnot help

    • Server may throttle

Server A

Applications

Client

Server B

Request

Request

Request

Request

20


Logging Solution 2 - Distribute "Append Only”

  • Prefix timestamp such that load is distributed

    • Id of the node logging

    • Hash into N buckets

  • Write load is now distributed

  • Better throughput

  • To query logs in time range

    • Parallelize it across prefix values

Server A

Applications

Client

Server B

Request

Request

Request

Request

21


Key Selection: Query Efficiency & Speed

  • Select keys that allow fast retrieval

  • Reduce scan range

  • Reduce scan frequency

22


Single Entity Query

  • Where PartitionKey=‘SciFi’ and RowKey = ‘Star Trek’

  • Efficient processing

  • No continuation tokens

Server A

Client

Request

Server B

Result

23


Table Scan Query

  • Select * from Movies where Rating > 4

  • Returns Continuation token

    • 1000 movies in result set

    • Partition range boundary

  • Serial Processing: Wait for

    continuation token before

    proceeding

Returns 1000 movies

Partition range boundary hit

Server A

Cont.

Cont.

Return continuation

Client

Request

Request Cont.

Request Cont.

Server B

Cont.

24


Make Scans Faster

  • Split “Select * from Movies where Rating > 4” into

    • Where PartitionKey >= “A” and PartitionKey < “D” and Rating > 4

    • Where PartitionKey >= “D” and PartitionKey < “I” and Rating > 4

    • Etc.

  • Execute in parallel

  • Each query handles continuation

Server A

Cont.

Cont.

Request

Client

Request

Request

Server B

Cont.

25


Query Speed

  • Fast

    • Single PartitionKey and RowKey with equality

  • Medium

    • Single partition but a small range for RowKey

    • Entire partition or table that is small

  • Slow

    • Large single scan

    • Large table scan

    • “OR” predicates on keys => no query optimization => results in scan

  • Expect continuation token for all except in 1

  • 26


    Make Queries Faster

    • Large Scans

      • Split the range and parallelize queries

      • Create and maintain own views that help queries

    • “Or” Predicates

      • Execute individual query in parallel instead of using “OR”

    • User Interactive

      • Cache the result to reduce scan frequency

    27


    Expect Continuation Tokens – Seriously!

    • Maximum of 1000 rows in a response

    • At the end of partition range boundary

    • Maximum of 5 seconds to execute the query

    28


    Entity Group Transactions (EGT) (new)

    • Atomically perform multiple insert/update/deleteover entities in same partition in a single transaction

    • Maximum of 100 commands in a single transaction and payload < 4 MB

    • ADO.Net Data Service

      • Use SaveChangesOptions.Batch

    29


    Key Selection: Entity Group Transaction

    • Case Study

      • Maintain user account information

        • Account ID, User Name, Address, Number of rentals

      • Maintain information of checked out rentals

        • Account ID, Movie Title, Check out date, Due date

    • Solution 1 – Maintain two tables – Users & Rentals

      • Handle Cross table consistency

        • Insert into Rentals table succeeds

        • Update to Users table fails

        • Queue to maintain consistency

    30


    Solution 2

    • Store Account Information and Rental details in same table

      • Maintain same PartitionKey to enforce transactions

        • Account ID as PartitionKey

      • Update total count and Insert new rentals using Entity Group Transaction

      • Prefix RowKey with “Kind” code: A = Account, R = Rental

        • Row key for account info: [Kind Code]_[AccountId]

        • Row Key for rental info: [Kind Code]_[Title]

      • Rental Properties not set for Account row and vice versa

    31


    Best Practices & Summary

    • Select PartitionKey and RowKey that help scale

      • Efficient for frequently used queries

      • Supports batch transactions

      • Distributes load

    • Distribute “Append only” patterns using prefix to PartitionKey

    • Always Handle continuation tokens

    • Client can maintain their own cache/views instead of frequent scans

      • Future Feature - Secondary Index

    • Execute parallel queries instead of “OR” predicates

    • Implement back-off strategy for retries

    32


    Agenda

    • Overview of Windows Azure Tables

    • Patterns and Practices for Windows Azure Tables

    • Overview of Windows Azure Queues

    • Patterns and Practices for Windows Azure Queues

    • Q & A

    33


    Windows Azure Queues

    • Queue are performance efficient, highly available and provide reliable message delivery

      • Simple, asynchronous work dispatch

      • Programming semantics ensure that a message can be processed at least once

    • Access is provided via REST

    34


    Queue Storage Concepts

    Accounts

    Queues

    Messages

    128 x 128 http://...

    thumbnailjobs

    256 x 256 http://...

    sally

    http://...

    traverselinks

    http://...

    35


    Account, Queues and Messages

    • An account can create many queues

      • Queue Name is scoped by the account

    • A Queue contains messages

      • No limit on number of messages stored in a queue

      • Set a limit for message expiration

    • Messages

      • Message size <= 8 KB

      • To store larger data, store data in blob/entity storage, and the blob/entity name in the message

      • Message now has dequeue count

    36


    Queue Operations

    • Queue

      • Create Queue

      • Delete Queue

      • List Queues

      • Get/Set Queue Metadata

    • Messages

      • Add Message (i.e. Enqueue Message)

      • Get Message(s) (i.e. Dequeue Message)

      • Peek Message(s)

      • Delete Message

    37


    Queue Programming Api

    CloudQueueClientqueueClient = newCloudQueueClient(baseUri, credentials);

    CloudQueuequeue = queueClient.GetQueueReference("test1");

    queue.CreateIfNotExist();

    //MessageCountis populated via FetchAttributes

    queue.FetchAttributes();

    CloudQueueMessagemessage = newCloudQueueMessage("Some content");

    queue.AddMessage(message);

    message = queue.GetMessage(TimeSpan.FromMinutes(10) /*visibility timeout*/);

    //Process the message here …

    queue.DeleteMessage(message);

    38


    Agenda

    • Overview of Windows Azure Tables

    • Patterns and Practices for Windows Azure Tables

    • Overview of Windows Azure Queues

    • Patterns and Practices for Windows Azure Queues

    • Q & A

    39


    Removing Poison Messages

    Producers

    Consumers

    C1

    P2

    1. GetMessage(Q, 30 s)  msg 1

    4

    0

    3

    0

    3

    2

    0

    2

    1

    2

    1

    2

    1

    1

    1

    1

    1

    1

    1

    1

    0

    C2

    P1

    2. GetMessage(Q, 30 s)  msg 2

    40


    Removing Poison Messages

    Producers

    Consumers

    1

    1

    C1

    P2

    1. GetMessage(Q, 30 s)  msg 1

    5. C1 crashed

    4

    0

    3

    0

    2

    1

    1

    1

    1

    2

    1

    2

    1

    1

    3

    6. msg1 visible 30 s after Dequeue

    2

    1

    C2

    P1

    2. GetMessage(Q, 30 s)  msg 2

    3. C2 consumed msg 2

    4. DeleteMessage(Q, msg 2)

    7. GetMessage(Q, 30 s)  msg 1

    41


    Removing Poison Messages

    Producers

    Consumers

    1. Dequeue(Q, 30 sec)  msg 1

    5. C1 crashed

    10. C1 restarted

    11. Dequeue(Q, 30 sec)  msg 1

    12. DequeueCount > 2

    13. Delete (Q, msg1)

    C1

    P2

    4

    0

    3

    0

    1

    2

    1

    3

    1

    2

    1

    3

    3

    1

    2

    C2

    P1

    6. msg1 visible 30s after Dequeue

    9. msg1 visible 30s after Dequeue

    2. Dequeue(Q, 30 sec)  msg 2

    3. C2 consumed msg 2

    4. Delete(Q, msg 2)

    7. Dequeue(Q, 30 sec)  msg1

    8. C2 crashed

    42


    Best Practices & Summary

    • Make message processing idempotent

      • No need to deal with failures

    • Do not rely on order

      • Invisible messages result in out of order

    • Use Dequeue count to remove poison messages

      • Enforce threshold on message’s dequeue count

    • Use message count to dynamically increase/reduce workers

    • Use blob to store message data with reference in message

      • Messages > 8KB

      • Batch messages

      • Garbage collect orphaned blobs

    43


    Future Features

    • Allow workers to extend invisibility time

      • Time to process message unknown at dequeue time

      • Worker can extend the time as needed

    • Allow longer invisibility time

      • Long running work items may need more than 2 hours

    • Allow messages to not expire

      • Large backlogs will not cause messages to expire

    44


    Takeaways

    • Table

      • Scalable & Reliable Structured Storage System

      • Partitioning is critical to scalability

      • Entity Group Transactions (new)

    • Queue

      • Scalable & Reliable Messaging System

      • Dequeue count returned with message (new)

    • Use back-off strategy on retries

    • Official Storage Client Library (new)

    45


    Windows Azure Session Alerts!!

    • Storing and Manipulating Blobs and Files with Windows Azure Storage – 11/18 (4:30 PM)

    • Patterns for building Reliable & Scalable Applications with Windows Azure – 11/19 (8:30 AM)

    • Automating the Application Lifecycle with Windows Azure – 11/19 (10:00 AM)


    Q&A


    Windows Azure PDC Swag


    YOUR FEEDBACK IS IMPORTANT TO US!

    Please fill out session evaluation forms online at

    MicrosoftPDC.com


    Learn More On Channel 9

    • Expand your PDC experience through Channel 9

    • Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses

      channel9.msdn.com/learn

    Built by Developers for Developers….


  • Login