1 / 27

ICS 214B: Transaction Processing and Distributed Data Management

ICS 214B: Transaction Processing and Distributed Data Management. Lecture 17: Providing Database as a Service Professor Chen Li. Based on slides developed by Hakan Hacigumus, Bala Iyer, and Sharad Mehrotra ICDE 2002, San Jose, CA, USA. Talk Outline. Software as a Service

mabli
Download Presentation

ICS 214B: Transaction Processing and Distributed Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICS 214B: Transaction Processing and Distributed Data Management Lecture 17: Providing Database as a Service Professor Chen Li Based on slides developed by Hakan Hacigumus, Bala Iyer, and Sharad Mehrotra ICDE 2002, San Jose, CA, USA

  2. Talk Outline • Software as a Service • Database as a Service • NetDB2 System • Challenges for Database as a Service • User Interface Issues • Performance Issues • Data Privacy Issues • Data Encryption in DBMSs for Data Privacy • Conclusion

  3. Software as a Service • Get … • what you need • when you need • Pay … • what you use • Don’t worry … • how to deploy, implement, maintain, upgrade

  4. Software as a Service • Driving forces to paradigm shift • Faster, cheaper, more accessible networks • Rise of distributed architectures • Virtualization in server and storage technologies • Established e-business infrastructures • Hardware/Software is not the largest in total cost of ownership • User Operations 46% • Technical Support 24% • Capital Cost (HW/SW) 21% (Source: Gartner Group) • Hardware, software, network costs have been decreasing more sharply than personnel cost

  5. Software as a Service • Already in the market as • storage services, disaster recovery services, e-mail services, rent-a-spreadsheet services etc. • Sun ONE, Oracle Online Services, Microsoft .NET My Services etc. Why notDatabase as a Service?

  6. Database as a Service - Why? • Organizations need data management • DBMSs are complex systems to deploy, setup, maintain • requires highly skilled people (DBAs etc.) with high cost

  7. Database as a Service - Offerings • Inherits all advantages of software as a service, plus … • Service provider allows mechanisms to • create, store, access databases • DB management transferred to service provider for • backup, administration, restoration, space management, upgrades • Clients use the services providers HW, SW, personnel instead of their own

  8. NetDB2 - Database Service Provision • Developed in collaboration with University of California, Irvine and IBM • Deployed on the Internet over a year ago • Been used by 15 universities and more than 2500 students to help teaching database classes • Currently offered through IBM Scholars Program

  9. Three tier architecture Client - as thin as possible - just a browser Java based implementation Backed by fail-over solutions Allows expansions and user driven integration for application development Servlet Engine HTTP Server Database (User Data) User (Web Browser) Warm Standby Backup/Recovery Standby System NetDB2 System Architecture

  10. Database as a Service - Issues Issues to address: • User Interface • Performance • Data Privacy

  11. 2 1 4 3 User Interface • Simple yet powerful • supports SQL queries, scripts, UDFs, stored procedures, metadata, data upload • Consistent • Region-based composition • Expansion/Integration • User defined interfaces

  12. Performance • Interaction in a different medium - network • Performance should -at least- match what we have already • Experimented with TPC-H database and queries

  13. Data Privacy • Users give control of their data to service provider • Attacks on stored data is a well known problem • So, they need data security in place • Security of data over the network is well studied • SSL,TSL • Establish security for stored data • even it is stolen should not make sense Encryption !

  14. ID NAME DEPTID SALARY 20 John White 2 40000 ID ID ID ID NAME NAME NAME NAME DEPTID DEPTID DEPTID DEPTID SALARY SALARY SALARY SALARY ID NAME DEPTID SALARY %&*((@sFDdw?~$@33<?.%d(*##!@<<&&=+ 20 20 20 20 John White John White Fg4$$xX@<+- John White 2 2 2 2 40000 40000 40000 40000 $Sfsdf@_))#$dw?~$@33<?.%*##!@<<&&=+ 43 Bob Drake 3 85000 41 41 41 41 Linda Cone Linda Cone Linda Cone %25>LWe?#@ 3 3 3 3 90000 90000 90000 90000 <?.%d(*##!@%&*((@ <<&&=+sFDdw?~$@33 50 Sarah Brown 7 95000 000000000000000000000000000000000000000000000000 iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 43 43 43 43 Bob Drake Bob Drake 2We??#@$&& Bob Drake 3 3 3 3 85000 85000 85000 85000 ?~$<&&=+@33<?.% %&*((@sFDdwd(*##!@< 50 50 50 50 Sarah Brown Dadsf$&%!Aq Sarah Brown Sarah Brown 7 7 7 7 95000 95000 95000 95000 %&*((@sFDdw? @<<&&=+~$@33<? ((@sFD Encryption Alternatives • Implementation Level • Software v.s. Hardware encryption • Granularity of Data • Field (Attribute) level • Row (Record) level • (Disk) Page level ?

  15. Encryption Alternatives (2) • Field level encryption • Pros: • Easier to implement and integrate • Flexible • Allows selective encryption, reduces number of bytes to encrypt/decrypt • Cons: • Increases encryption overhead significantly due to invocation cost • Data size expansion (for block cipher algorithms) • Current optimization technologies do not handle foreign functions well

  16. Encryption Alternatives (3) • Row level encryption • Pros: • Reduces the data size expansion problem • Reduces invocation cost • Better security because of total encryption • Cons: • Does not allow selective encryption, increases the number of bytes to encrypt/decrypt • Implementation and integration can be hard when row functions are not supported

  17. Encryption Alternatives (4) • Page level encryption • Pros: • Significantly reduces encryption/decryption overhead due to reduced invocation cost • Eliminates data size expansion problem (for block ciphers) • Better security because of total encryption • Cons: • Implementation and integration is not straightforward • Increases the number of bytes to encrypt/decrypt each time • Higher update/delete cost, requires re-encryption of all affected pages

  18. Encryption Alternatives Experiments • Experimented with TPC-H database and queries Encryption scheme alternatives (V: evaluated, ×: not evaluated) Data Granularity ImplementationField Level Row Level Page Level Software EncryptionV×× Hardware Encryption× VV

  19. Software - Field Level Encryption • Block Cipher Algorithm - Blowfish • Implemented as foreign function (UDF) • Sample insert insert into lineitem (discount) values (encrypt(10,key)); • Sample select select decrypt(discount,key) from lineitem where custid = 300;

  20. Software - Field Level Encryption (2) • Creator supplies the key • Unauthorized person can not get hold of the key • protection even from the service provider at some level • User can easily implement different encryption algorithm and check that into the system • different encryption algorithm/key can be used for different fields

  21. Software - Field Level Encryption (3) • Q#1 excluded • TPC-H queries, except Q#1 • * Only one field (l_discount of lineitem table) encrypted • Introduced very large overhead

  22. TPC-H Query # 1 • Problem: Multiple decryption on same field select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from tpcd.lineitem where l_shipdate <= date ('1998-12-01') - 90 day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;

  23. Query Rewrite to Improve Performance • Problem: Multiple decryption on same field (e.g., TPC-H Q#1) • CSE based algorithm to eliminate redundant decryptions • Use temporary view

  24. Hardware - Row Level Encryption • Specialized hardware IBM S/390 Cryptographic Coprocessor under IBM OS/390 • “editproc” facility • invoked for “whole row” • upon read/write request, encrypt/decrypt is invoked from hardware for the row

  25. SW Field Level v.s. HW Row Level • Experimented on TPC-H Q#1 • Software Field Level: Only one field is encrypted • Hardware Row Level: All fields are encrypted

  26. Hardware - Page Level Encryption • Page level encryption is simulated • It gives significant improvement due to reduction in start-up cost

  27. Conclusion • Database as a Service is a new model to alleviates the need to • hire professionals • purchase expensive hardware/software • deal with administrative and maintenance tasks • It is viable model and can emerge as a successful offering • Encryption is a solution for privacy -the most important issue- • Hardware encryption has a clear superiority over software • Hardware makes encryption practical for databases • There are trade-offs for granularity of data

More Related