A Software-Defined
This presentation is the property of its rightful owner.
Sponsored Links
1 / 23

A Software-Defined Networking based Approach for Performance Management of Analytical Queries on Distributed Data Stores PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on
  • Presentation posted in: General

A Software-Defined Networking based Approach for Performance Management of Analytical Queries on Distributed Data Stores. Pengcheng Xiong (NEC Labs America) Hakan Hacigumus (NEC Labs America ) Jeffrey F. Naughton (Univ. of Wisconsin). Agenda. Why? Motivation and background How?

Download Presentation

A Software-Defined Networking based Approach for Performance Management of Analytical Queries on Distributed Data Stores

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A software defined networking based approach for performance management of analytical queries on distributed data stores

A Software-Defined Networking based Approach for Performance Management of Analytical Queries onDistributed Data Stores

Pengcheng Xiong (NEC Labs America)

Hakan Hacigumus (NEC Labs America)

Jeffrey F. Naughton (Univ. of Wisconsin)


Agenda

Agenda

  • Why?

    • Motivation and background

  • How?

    • System architecture and implementation

  • So what?

    • Real system and benchmark query evaluation

  • Conclusion


Motivation

Motivation

  • Data analytics applications or data scientists query the data from distributed stores.

    • A huge amount of data traffic on the network.

      • Join

    • Many applications want to share a cluster

      • Data backup, video streaming, etc

    • Response time is critical

      • Deadline-driven reports

    • Query service differentiation

      • Batch queries, interactive queries


An example query tpc h q14

An example query (TPC-H Q14)

We assume that tables are distributed at relational data stores.

Relational data stores are connected by networking


Network change implies plan perf change

Network change implies plan perf. change

(2) The best plan can become the worst one

Network status changes

(1) Huge gap

Phase 1

Phase 2

Phase 3


What if

What if?

What if query optimizer can dynamically monitor the network bandwidth and adaptively choose plan?

Adaptive plan is chosen and query execution time is kept short.

Phase 1

Phase 2

Phase 3


Network busy implies no good plan

Network busy implies no good plan

Well… I am sorry. None of the candidate plans can meet your deadline due to current busy network status.

Run query right now and right away. I need that ASAP to catch my deadline!

Distributed DBMS

User


What if1

What if?

OK. Although current network is busy, I can control it to prioritize the bandwidth for the query.

Run query right now and right away. I need that ASAP to catch my deadline!

Distributed DBMS

User

What if query optimizer can control the network?


Distributed query optimizer monitors and controls the network

Distributed query optimizer monitors and controls the network?


Sounds like a mission impossible

Sounds like a mission impossible

  • Database always treats the underneath networking as a black box

    • unable to monitor

    • let alone to control

  • With software-defined networking

    • inquire about the current status of the network, or

    • control the network with directives

Able to inquire

and control

Unable to monitor,

let alone to control

Networking

Networking

With SDN


Sounds interesting but how

Sounds interesting, but how?

Ethernet Switch/Router


A software defined networking based approach for performance management of analytical queries on distributed data stores

Control Path (Software)

Data Path (Hardware)


A software defined networking based approach for performance management of analytical queries on distributed data stores

Dist. Query Optimizer

Our contribution

API

OpenFlow Controller

Control Path

OpenFlow

OpenFlow Protocol (SSL/TCP)

Data Path (Hardware)


System architecture

System architecture


System implementation

System implementation

NEC PFS5240

Beacon


Plan generation

Plan generation

Stores lineitem table

Stores part table


Cost estimation

Cost estimation

  • Cost model for network operator

    • Amount of data transferred

    • Real-time transfer speed

      • (Monitor)

        • Take any bandwidth left

      • (Control)

        • Assign the highest priority

        • Make a bandwidth reservation

SDN support


Evaluation

Evaluation

  • Setup

    • TPC-H, scaling factor 100, Q14

    • Small tables (supplier, nation, region) are replicated.

    • Other tables are placed at a single data store site

    • Neighbor traffic generator-iperf

    • Summary of case studies


Case 1 single user single thread iperf

Case 1: single user, single-thread, iperf

Bottleneck

Based on SDN, query optimizer can dynamically monitor the network bandwidth and adaptively choose the best plan

Bottleneck

Bottleneck

Phase 1

Phase 2

Phase 3


Case 3 multiple users multiple thread no contention traffic priority queue

Case 3: multiple users, multiple-thread,no contention traffic, priority queue

Based on SDN, premium queries run faster than regular ones.

Based on SDN, all queries run faster.


Case study 5 single user multi thread iperf weighted fair queue

Case study 5: single user, multi-thread, iperf, weighted-fair queue

Based on SDN, more reservation makes queries run faster.


Conclusion

Conclusion

  • SDN can be effectively exploited for performance management of analytical queries on distributed data stores

    • Directly monitor the network and adaptively pick the best plan.

    • Control the priority of network traffic or make network bandwidth reservations to differentiate the query service.

  • Lots of opportunities


A software defined networking based approach for performance management of analytical queries on distributed data stores

Thanks!


  • Login