The hadoop rdbms replace oracle with hadoop john leach cto and co founder j
Download
1 / 15

The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J - PowerPoint PPT Presentation


  • 142 Views
  • Uploaded on

The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J. who we are. The Hadoop RDBMS. Standard ANSI SQL Horizontal Scale- Out Real -Time Updates ACID Transactions Powers OLAP and OLTP Seamless BI Integration. Splice Machine Proprietary and Confidential.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J' - abba


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The hadoop rdbms replace oracle with hadoop john leach cto and co founder j

The Hadoop RDBMS

Replace Oracle with Hadoop

John Leach CTO and Co-Founder

J


Who we are
who we are

The

Hadoop

RDBMS

  • Standard ANSI SQL

  • Horizontal Scale-Out

  • Real-Time Updates

  • ACID Transactions

  • Powers OLAP and OLTP

  • Seamless BI Integration

Splice Machine Proprietary and Confidential


S erialization and write p ipelining
serialization and write pipelining

  • Serialization Goals

    • Disk Usage Parity with Data Supplied

    • Predicate evaluation use byte[] comparisons (sorted)

    • Memory and CPU efficient (fast)

    • Lazy Serialization and Deserialization

  • Write Pipelining Goals

    • Non-blocking Writes

    • Transactional Awareness

    • Small Network Footprint

    • Handle Failure, Location, and Retry Semantics


Single column encoding
Single Column Encoding

  • All Columns encoded in a single cell

    • separated by 0x00 byte

  • Nulls are encoded either as “explicit null” or as an absent field

  • Cell value prefixed by an Index containing

    • which fields are present in cell

    • whether the field is

      • Scalar (1-9 Bytes)

      • Float (4 Bytes)

      • Double (8 Bytes)

      • Other (1 – N Bytes)


Example insert
Example Insert

  • Table Schema: (a int, b string)

  • Insert row (1,’bob’):

    • All columns packed together

      • 1 0x00 ‘bob’

    • Index prepended

      • {1(s),2(o)}0x00 1 0x00 ‘bob’


Example insert w nulls
Example Insert w/ nulls

  • Row (1,null)

    • nulls left absent

      • 1

    • Index prepended (field B is not present)

      • {1(s)} 0x00 1


Example update
Example: Update

  • Row already present: {1(s),2(o)}

  • set a = 2

    • Pack entry

      • 2

    • prepend index (field B is not present)

      • {1(s)}0x00 2


Decoding
Decoding

  • Indexes are cached

    • Most data looks like it’s predecessor

  • Values are read in reverse timestamp order

    • Updates before inserts

  • Seek through bytes for fields of interest

  • Once a field is populated, ignore all other values for that field.


Example decoding
Example Decoding

  • Start with (NULL,NULL)

  • 2 KeyValues present:

    • {1(s)}0x00 2

    • {1(s),2(o)} 0x00 1 0x00 ‘bob’

  • Read first KeyValue, fill field 1

    • Row: (2,NULL)

  • Read second KeyValue, skip field 1(already filled), fill field 2:

    • Row: (2,’bob’)


Index decoding
Index Decoding

  • Index encoded differently depending on number of columns present and type

    • Uncompressed: 1 bit for present, 2 bits for type

    • Compressed: Run-length encoded (field 1-3, scalar, 5-8 double…)

    • Sparse: Delta encoded (index,type) pairs

    • Sparse compressed: Run-length encoded (index,type) pairs


Write pipeline
Write Pipeline

  • Asynchronous but guaranteed delivery

  • Operate in Bulk

    • Row or Size bounded

    • Highly Configurable

  • Utilizes Cached Region Locations

  • Server component modeled after Java’s NIO

    • Attach Handlers for different RDBMS features

  • Handle retries, failure, and SQL semantics

    • Wrong Region, Region Too Busy, Primary Key Violation, Unique Constraint Violation


Write pipeline base element
Write Pipeline Base Element

  • Rows are encoded into custom KVPairs

    • all rows for a family and column are grouped together

    • <byte[],byte[]>

  • Exploded into Put only to write to HBase

    • Timestamps added on server side

  • Supports snappy compression


Write pipeline client
Write Pipeline Client

  • Tree Based Buffer

    • Table -> Region -> N Buffers

    • Rows are buffered on client side in memory

    • N is configurable

  • When buffer fills

    • asynchronously write batch to Region

  • Handles HBase “difficulties” gracefully

    • Wrong Region

      • Re-bucket

    • Too Busy

      • Add delay and possibly back-off

    • etc.


Write pipeline server side
Write Pipeline Server Side

  • Coprocessor based

  • Limited number of concurrent writes to a server

    • excess write requests are rejected

    • prevents IPC thread starvation

  • SQL Based Handlers for parallel writes

    • Indexes, Primary Key Constraints, Unique Constraints

  • Writes occur in a single WALEdit on each region


Interests
Interests

  • Other items we have done or interested in…

    • Burstable Tries Implementation of Memstore

    • Pluggable Cost Based Genetic Algorithm for Assignment Manager

    • Columnar Representations and in-memory processing.

    • Concurrent Bloom Filter (i.e. Thread Safe BitSet)

  • We are hiring


ad