The hadoop rdbms replace oracle with hadoop john leach cto and co founder j
Download
1 / 15

The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J - PowerPoint PPT Presentation


  • 146 Views
  • Uploaded on

The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J. who we are. The Hadoop RDBMS. Standard ANSI SQL Horizontal Scale- Out Real -Time Updates ACID Transactions Powers OLAP and OLTP Seamless BI Integration. Splice Machine Proprietary and Confidential.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J' - abba


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The hadoop rdbms replace oracle with hadoop john leach cto and co founder j

The Hadoop RDBMS

Replace Oracle with Hadoop

John Leach CTO and Co-Founder

J


Who we are
who we are

The

Hadoop

RDBMS

  • Standard ANSI SQL

  • Horizontal Scale-Out

  • Real-Time Updates

  • ACID Transactions

  • Powers OLAP and OLTP

  • Seamless BI Integration

Splice Machine Proprietary and Confidential


S erialization and write p ipelining
serialization and write pipelining

  • Serialization Goals

    • Disk Usage Parity with Data Supplied

    • Predicate evaluation use byte[] comparisons (sorted)

    • Memory and CPU efficient (fast)

    • Lazy Serialization and Deserialization

  • Write Pipelining Goals

    • Non-blocking Writes

    • Transactional Awareness

    • Small Network Footprint

    • Handle Failure, Location, and Retry Semantics


Single column encoding
Single Column Encoding

  • All Columns encoded in a single cell

    • separated by 0x00 byte

  • Nulls are encoded either as “explicit null” or as an absent field

  • Cell value prefixed by an Index containing

    • which fields are present in cell

    • whether the field is

      • Scalar (1-9 Bytes)

      • Float (4 Bytes)

      • Double (8 Bytes)

      • Other (1 – N Bytes)


Example insert
Example Insert

  • Table Schema: (a int, b string)

  • Insert row (1,’bob’):

    • All columns packed together

      • 1 0x00 ‘bob’

    • Index prepended

      • {1(s),2(o)}0x00 1 0x00 ‘bob’


Example insert w nulls
Example Insert w/ nulls

  • Row (1,null)

    • nulls left absent

      • 1

    • Index prepended (field B is not present)

      • {1(s)} 0x00 1


Example update
Example: Update

  • Row already present: {1(s),2(o)}

  • set a = 2

    • Pack entry

      • 2

    • prepend index (field B is not present)

      • {1(s)}0x00 2


Decoding
Decoding

  • Indexes are cached

    • Most data looks like it’s predecessor

  • Values are read in reverse timestamp order

    • Updates before inserts

  • Seek through bytes for fields of interest

  • Once a field is populated, ignore all other values for that field.


Example decoding
Example Decoding

  • Start with (NULL,NULL)

  • 2 KeyValues present:

    • {1(s)}0x00 2

    • {1(s),2(o)} 0x00 1 0x00 ‘bob’

  • Read first KeyValue, fill field 1

    • Row: (2,NULL)

  • Read second KeyValue, skip field 1(already filled), fill field 2:

    • Row: (2,’bob’)


Index decoding
Index Decoding

  • Index encoded differently depending on number of columns present and type

    • Uncompressed: 1 bit for present, 2 bits for type

    • Compressed: Run-length encoded (field 1-3, scalar, 5-8 double…)

    • Sparse: Delta encoded (index,type) pairs

    • Sparse compressed: Run-length encoded (index,type) pairs


Write pipeline
Write Pipeline

  • Asynchronous but guaranteed delivery

  • Operate in Bulk

    • Row or Size bounded

    • Highly Configurable

  • Utilizes Cached Region Locations

  • Server component modeled after Java’s NIO

    • Attach Handlers for different RDBMS features

  • Handle retries, failure, and SQL semantics

    • Wrong Region, Region Too Busy, Primary Key Violation, Unique Constraint Violation


Write pipeline base element
Write Pipeline Base Element

  • Rows are encoded into custom KVPairs

    • all rows for a family and column are grouped together

    • <byte[],byte[]>

  • Exploded into Put only to write to HBase

    • Timestamps added on server side

  • Supports snappy compression


Write pipeline client
Write Pipeline Client

  • Tree Based Buffer

    • Table -> Region -> N Buffers

    • Rows are buffered on client side in memory

    • N is configurable

  • When buffer fills

    • asynchronously write batch to Region

  • Handles HBase “difficulties” gracefully

    • Wrong Region

      • Re-bucket

    • Too Busy

      • Add delay and possibly back-off

    • etc.


Write pipeline server side
Write Pipeline Server Side

  • Coprocessor based

  • Limited number of concurrent writes to a server

    • excess write requests are rejected

    • prevents IPC thread starvation

  • SQL Based Handlers for parallel writes

    • Indexes, Primary Key Constraints, Unique Constraints

  • Writes occur in a single WALEdit on each region


Interests
Interests

  • Other items we have done or interested in…

    • Burstable Tries Implementation of Memstore

    • Pluggable Cost Based Genetic Algorithm for Assignment Manager

    • Columnar Representations and in-memory processing.

    • Concurrent Bloom Filter (i.e. Thread Safe BitSet)

  • We are hiring

    • Just Completed $15M Series B Raise

    • careers@splicemachine.com