1 / 36

Exploring Collaboration Between Blockchain and Distributed Databases

Exploring Collaboration Between Blockchain and Distributed Databases. Bo Wang Nov. 27, 2018. Blockchain. Chained list of blocks Block: a hash pointer to previous block, timestamp, transaction data Store the head of the list: a hash‐pointer that points to the latest block. Blockchain.

Ava
Download Presentation

Exploring Collaboration Between Blockchain and Distributed Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring Collaboration Between Blockchain and Distributed Databases Bo Wang Nov. 27, 2018

  2. Blockchain • Chained list of blocks • Block: a hash pointer to previous block, timestamp, transaction data • Store the head of the list: a hash‐pointer that points to the latest block

  3. Blockchain • Managed by a peer-to-peer network • Consensus algorithm: Proof-of-Work (PoW): Nodes are competing to find a nonce to solve a puzzle. H (nonce || prev_hash || tx || tx || ... || tx) < target

  4. Features of Blockchain • Special kind of distributed DB: provide data storage • Decentralization: no central authority • Immutability / Tamper-resistance

  5. Drawbacks of Blockchain • Low throughput Bitcoin: 1 tps (transaction per second) average, maximum 7 tps Visa: 2, 000 tps typical, 10, 000 tps peak • Long latency Bitcoin: 10 minutes / block, 1 hour / transaction Financial Applications: 30 to 100 ms • Low capacity Bitcoin: 180GB in 2018 Big Data: petabytes (1,000,000GB) • No query

  6. Features of Distributed Databases • High throughput Cassandra: 50 nodes, 174, 000 writes/second, 2011 A few dozen nodes, 1 million writes/second, 2014 • Large capacity Each node stores a subset of data Linear increase in capacity with the number of nodes • Short latency Cassandra: 10 ms read/write, tested by UoT, 2012 Latency does not worsen as the number of nodes increases. • Rich query

  7. Drawbacks of Distributed Databases • Centralized: depend on trusted third-parties • Vulnerable to cyber attacks • Tampering with data can go undetected. e.g. data alteration and deletion • Data integrity can be hardly restored once lost.

  8. Collaboration: Blockchain-based DB

  9. Collaboration: Blockchain-based DB • Blockchain Low performance, High security • Distributed databases High performance, Low security • Blockchain-based DBs High performance, High security

  10. Technology Choices in Blockchain • Consensus algorithm: PoW • It takes a node 10 minutes to find a nonce to validate a block. • Increasing computing power won’t improve performance. • Full replication • Each node stores a copy of all the data. • This copy is typically kept on a single hard drive.

  11. Technology Choices in Distributed DB • Partial replication • Each node keeps some of the data. • Each bit of data is replicated on several nodes. • Paxos consensus algorithm • Fault-tolerant, reach consensus with unresponsive nodes. • Lineage, well handle high throughput, low latency, high capacity, efficient network utilization, any shape of data…

  12. Principles of Integration • Increase performance • Keep more distributed DB features • Increase security • Add more blockchain features

  13. Blockchain-based DB: BigchainDB • Built on top of a distributed DB, e.g., RethinkDB • Inherits high performance from distributed DB. • Nodes can be added to increase throughput and capacity. • Add blockchain features • Decentralized control • Immutability • The ability to create & transfer assets

  14. BigchainDB Architecture • Presents API to clients as a single DB. • Each node has two distributed DB: S and C. • Each DB runs its own internal consensus algorithm, e.g., Paxos • S and C are connected by BigchainDB consensus algorithm.

  15. BigchainDB Architecture • S: unordered set of txns • Validate new txn • Assign to other nodes • Node K: signing node • Sk: set of txns assigned to node K • Create a block of Sk • Put the block into C • C: ordered list of blocks

  16. BigchainDB Architecture • Voting Mechanism • Each signing node votes whether a block is valid or invalid. • Check validity of every transaction in the block. • Quorum is a majority of votes.

  17. Behavioral Description • Left: The backlog S starts empty and the chain C starts with only a genesis block. • Right: Clients have inserted transactions into backlog S and assigned to nodes 1, 3, and 2.

  18. Behavioral Description • Left: Node 1 has moved its assigned transactions from backlog S to chain C. • Right: Node 3 has processed its assigned transactions too.

  19. Behavioral Description • Transactions from an invalid block (on right, shaded) get re-inserted into backlog S for re-consideration.

  20. Behavioral Description: Multiple Machines • More than one client may talk to a given node.

  21. Behavioral Description: Multiple Machines • There are multiple nodes. • Each node has a view into S and C. • Typically a client connects to just one node.

  22. BigchainDB Consensus Algorithm (BCA) • BCA is a state machine running on each signing node. • mainLoop()

  23. Blockchain of BigchainDB • Each block is written before a quorum of nodes votes on it. • Chainification happens at voting time. • Every block has an id equal to the hash of its transactions, timestamp, voters list and public key of its creator-node. • A block does not include the hash (id) of the previous block when it first gets written. • Votes get appended to the block over time, and each vote has a “previous block” attribute equal to the hash of the previous block.

  24. Experimental Results • Throughput increased proportionately to the number of nodes. • The write throughput was over 1 million per second when we have 32 nodes.

  25. Experimental Results • Linear scaling in write performance with the number of nodes.

  26. Summary of BigchainDB • Main purpose: increase performance, points to 1 million writes per second, sub-second latency, and petabyte capacity. • Use a lightweight consensus algorithm: voting instead of PoW to validate blocks. • Sacrifice some security guarantees. • In case of a majority of malicious nodes, it can no longer ensure data integrity. • Data redundancy: each node runs two DBs. • Inherit limitations of distributed DB.

  27. Blockchain-based DB to Ensure Data Integrity • Modern database systems use logging mechanisms to track data changes, e.g., Redo Log of Oracle. • If logging files are forged, recognizing an attack or a failure is awkward. • Typically, Remote Data Auditing mitigations are employed, but they come with high costs and rely on trusted third-parties. • Blockchain’s tamper-resistant feature can provide strong data integrity guarantees in trust-less networks.

  28. 2-Layer Blockchain Architecture (2LBC) • First layer: a permissioned blockchain • Uses a lightweight consensus protocol that assures low latency and high throughput. • Aims at quickly and reliably storing evidences of every operations. • Provides weak data integrity guarantees. • Second layer: a public permissionless blockchain • A PoW-based blockchain that stores evidences of the database operations logged by the first-layer.

  29. 2-Layer Blockchain Architecture (2LBC)

  30. 2-Layer Blockchain Architecture (2LBC) • “Mining Rotation” consensus algorithm • Divides time into rounds, each round, elects a miner as a leader. • The leader receives new operations, sign them with private key, and broadcast them to the other miners. • Blockchain Anchoring technique • Interaction between the first and second layer. • Periodically, the hash of the first layer blockchain is sent to the second layer blockchain via the Anchoring Manager.

  31. Strengths of 2LBC Architecture • Second layer PoW-based blockchain ensures data integrity. • Chang evidences of operation in the first layer: compromise all the replicas. • If the hash of the evidences has been stored in the second layer, the attacker effort is close to infinite. • First layer lightweight consensus ensures performance. • Second layer blockchain is running in the background. • From a client’s point of view, an operation on the database is completed as soon as it is elaborated by the first-layer blockchain.

  32. Weaknesses of 2LBC Architecture • Availability • The first layer blockchain is designed upon a total consensus mechanism. • Availability can be critically affected by violating only a single miner. • Scalability • The overall system performance does not scale adding new nodes. • The used total consensus algorithm has lower performance with additional nodes.

  33. Compare BigchainDB and 2LBC Architecture • All use lightweight consensus algorithms instead of PoW to ensure high throughput and low latency. • BigchainDB: Voting, quorum is majority. • 2LBC: Miner Rotation, all sign to validate a block. • BigchainDB stores all transactions in blockchain. • 2LBC only stores database operations in blockchain. • BigchainDB: good scalability, partial replication in each DB. • 2LBC: bad scalability, full replication in blockchain DB. • 2LBC has an extra layer of PoW blockchain to ensure security.

  34. Alternative Collaborations • Divide a whole business system into data-intensive modules and non-data-intensive modules. • Data-intensive modules can be built on traditional databases. • Non-data intensive modules can be built on blockchain. • Divide a whole business system into trust-related part and non-trust-related part. • Trust-related part should be simplified in data volume, e.g., using hash values, to accommodate to blockchain. • Non-trust-related part can be built on traditional databases.

  35. Conclusions • Blockchain’s decentralization and immutability features ensure data integrity, but its PoW consensus algorithm and full replication affect performance. • Distributed DBs have high performance but rely on trusted third-parties. Data integrity can be hardly restored once lost. • Blockchain-based DBs combine features of blockchain and distributed DBs to achieve high performance and high security.

  36. Thank you !

More Related