distributed data storage and parallel processing engine l.
Skip this Video
Loading SlideShow in 5 Seconds..
Distributed Data Storage and Parallel Processing Engine PowerPoint Presentation
Download Presentation
Distributed Data Storage and Parallel Processing Engine

Loading in 2 Seconds...

play fullscreen
1 / 33

Distributed Data Storage and Parallel Processing Engine - PowerPoint PPT Presentation

  • Uploaded on

Sector & Sphere. Distributed Data Storage and Parallel Processing Engine. Yunhong Gu Univ. of Illinois at Chicago. What is Sector/Sphere?. Sector: Distributed File System Sphere: Parallel Data Processing Engine (generic MapReduce) Open source software, GPL/BSD, written in C++.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Distributed Data Storage and Parallel Processing Engine' - talasi

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
what is sector sphere
What is Sector/Sphere?
  • Sector: Distributed File System
  • Sphere: Parallel Data Processing Engine (generic MapReduce)
  • Open source software, GPL/BSD, written in C++.
  • Started since 2006, current version 1.23
  • http://sector.sf.net
  • Motivation
  • Sector
  • Sphere
  • Experimental Results

Super-computer model:

Expensive, data IO bottleneck

Sector/Sphere model:

Inexpensive, parallel data IO,

data locality


Parallel/Distributed Programming with MPI, etc.:

Flexible and powerful.

But too complicated

Sector/Sphere model (cloud model):

Clusters are a unity to the developer, simplified programming interface.

Limited to certain data parallel applications.


Systems for single data centers:

Requires additional effort to locate and move data.

Sector/Sphere model:

Support wide-area data collection and distribution.

sector distributed file system
Sector Distributed File System

User account

Data protection

System Security



Service provider

System access tools

App. Programming Interfaces

Security Server







Encryption optional



Storage and Processing

sector distributed file system8
Sector Distributed File System
  • Sector stores files on the native/local file system of each slave node.
  • Sector does not split files into blocks
    • Pro: simple/robust, suitable for wide area, fast and flexible data processing
    • Con: users need to handle file size properly
  • The master nodes maintain the file system metadata. No permanent metadata is needed.
  • Topology aware
sector performance
Sector: Performance
  • Data channel is set up directly between a slave and a client
  • Multiple active-active masters (load balance), starting from 1.24
  • UDT is used for high speed data transfer
    • UDT is a high performance UDP-based data transfer protocol.
    • Much faster than TCP over wide area
udt udp based data transfer
UDT: UDP-based Data Transfer
  • http://udt.sf.net
  • Open source UDP based data transfer protocol
    • With reliability control and congestion control
  • Fast, firewall friendly, easy to use
  • Already used in many commercial and research software
sector fault tolerance
Sector: Fault Tolerance
  • Sector uses replications for better reliability and availability
  • Replicas can be made either at write time (instantly) or periodically
  • Sector supports multiple active-active masters for high availability
sector security
Sector: Security
  • Sector uses a security server to maintain user account and IP access control for masters, slaves, and clients
  • Control messages are encrypted
    • not completely finished in the current version
  • Data transfer can be encrypted as an option
  • Data transfer channel is set up by rendezvous, no listening server.
sector tools and api
Sector: Tools and API
  • Supported file system operation: ls, stat, mv, cp, mkdir, rm, upload, download
    • Wild card characters supported
  • System monitoring: sysinfo.
  • C++ API: list, stat, move, copy, mkdir, remove, open, close, read, write, sysinfo.
  • FUSE
sphere simplified data processing
Sphere: Simplified Data Processing
  • Data parallel applications
  • Data is processed at where it resides, or on the nearest possible node (locality)
  • Same user defined functions (UDF) are applied on all elements (records, blocks, or files)
  • Processing output can be written to Sector files or sent back to the client
  • Generalized Map/Reduce
sphere simplified data processing15
Sphere: Simplified Data Processing

for each file F in (SDSS datasets)

for each image I in F

findBrownDwarf(I, …);

SphereStream sdss;

sdss.init("sdss files");

SphereProcess myproc;

myproc->run(sdss,"findBrownDwarf", …);


findBrownDwarf(char* image, int isize, char* result, int rsize);

sphere data movement
Sphere: Data Movement
  • Slave -> Slave Local
  • Slave -> Slaves (Shuffle/Hash)
  • Slave -> Client
sphere udf vs mapreduce
Record Offset Index


Hashing / Bucket




Parser / Input Reader





Output Writer

Sphere/UDF vs. MapReduce
sphere udf vs mapreduce18
Sphere/UDF vs. MapReduce
  • Sphere is more straightforward and flexible
    • UDF can be applied directly on records, blocks, files, and even directories
    • Native binary data support
    • Sorting is required by Reduce, but it is optional in Sphere
  • Sphere uses PUSH model for data movement, faster than the PULL model used by MapReduce
why sector doesn t split files
Why Sector doesn’t Split Files?
  • Certain applications need to process a whole file or even directory
  • Certain legacy applications need a file or a directory as input
  • Certain applications need multiple inputs, e.g., everything in a directory
  • In Hadoop, all blocks would have to be moved to one node for processing, hence no data locality benefit.
load balance
Load Balance
  • The number of data segments is much more than the number of SPEs. When an SPE completes a data segment, a new segment will be assigned to the SPE.
  • Data transfer is balanced across the system to optimize network bandwidth usage.
fault tolerance
Fault Tolerance
  • Map failure is recoverable
    • If one SPE fails, the data segment assigned to it will be re-assigned to another SPE and be processed again.
  • Reduce failure is unrecoverable
    • In small-medium systems, machine failure during run time is rare
    • If necessary, developers can split the input into multiple sub-tasks to reduce the cost of reduce failure.
open cloud testbed
Open Cloud Testbed
  • 4 Racks in Baltimore (JHU), Chicago (StarLight and UIC), and San Diego (Calit2)
  • 10Gb/s inter-site connection on CiscoWave
  • 2Gb/s inter-rack connection
  • Two dual-core AMD CPU, 12GB RAM, 1TB single disk
  • Will be doubled by Sept. 2009.
the terasort benchmark
The TeraSort Benchmark
  • Data is split into small files, scattered on all slaves
  • Stage 1: On each slave, an SPE scans local files, sends each record to a bucket file on a remote node according to the key.
  • Stage 2: On each destination node, an SPE sort all data inside each bucket.

100 bytes record

Stage 2:

Sort each bucket

on local node











Stage 1:

Hash based on

the first 10 bits



performance results terasort
Performance Results: TeraSort

Run time: seconds

Sector v1.16 vs Hadoop 0.17

performance results terasort27
Performance Results: TeraSort
  • Sorting 1.2TB on 120 nodes
  • Sphere Hash & Local Sort: 981sec + 545sec
  • Hadoop: 3702/6675 seconds
  • Sphere Hash
    • CPU: 130% MEM: 900MB
  • Sphere Local Sort
    • CPU: 80% MEM: 1.4GB
  • Hadoop: CPU 150% MEM 2GB
the malstone benchmark
The MalStone Benchmark
  • Drive-by problem: visit a web site and get comprised by malware.
  • MalStone-A: compute the infection ratio of each site.
  • MalStone-B: compute the infection ratio of each site from the beginning to the end of every week.



Text Record

Event ID | Timestamp | Site ID | Compromise Flag | Entity ID

00000000005000000043852268954353585368|2008-11-08 17:56:52.422640|3857268954353628599|1|000000497829


Stage 2:

Compute infection rate

for each merchant

Site ID











Stage 1:

Process each record and hash into buckets according to site ID



performance results malstone
Performance Results: MalStone

Process 10 billions records on 20 OCT nodes (local).

* Courtesy of Collin Bennet and Jonathan Seidman of Open Data Group.

for more information
For More Information
  • Sector/Sphere code & docs: http://sector.sf.net
  • Open Cloud Consortium: http://www.opencloudconsortium.org
  • NCDM: http://www.ncdm.uic.edu