1 / 14

Data Inglorious

Data Inglorious. Atlas: “All this data sure is heavy.”. Data: “Indeed, may I suggest moving it to the cloud.”. d atabase defined. A database is a collection of data, which is organized into files called tables. These tables provide a systematic way of accessing, managing, and updating data.

una
Download Presentation

Data Inglorious

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Inglorious Atlas: “All this data sure is heavy.” Data: “Indeed, may I suggest moving it to the cloud.”

  2. database defined • A database is a collection of data, which is organized into files called tables. • These tables provide a systematic way of accessing, managing, and updating data. • A relational database is one that contains multiple tables of data that relate to each other through special key fields. • Relational databases are far more flexible (though harder to design and maintain) than what are known as flat file databases, which contain a single table of data.

  3. overview, the payload • Oracle Internet Directory, (OID) • Zynga Games/Farmville • Facebook • bioinformatics • Calmail

  4. ex. oracle OID • Oracle Internet Directory: 400,000 operations per second on a 500 million user database

  5. ex. zynga games • 65 million players a day, millions of web browsers open, millions of farms (Farmville game), millions of frontiers, millions of objects bought and sold…all recorded on a database • 500,000 operations-per-second database behind Farmville • http://www.readwriteweb.com/cloud/2010/08/membase-the-database-powering.php

  6. ex. facebook • 60,000 servers • 1,800 MySQL servers, • 400 million active users, • 200 million a day • 50 million operations per second

  7. ex. bioinformatics • DNA sequence data = prime candidate for study with database systems, • Homologous strings • Nucleic acids: Adenine, Guanine, Cytosine, Thymine • 3.4 million base pairs in the human genome, expressed as a string of AGC and T • Human Genome Project : 3.4 billion letters of the human genome, Sanger Institute: 1 billion on MySQL

  8. ex. calmail • Calmail: 4 million e-mails offered a day, 1 million served, MySQL backend, that just failed 

  9. flat file v. relational • Imagine the needs of two small companies that take customer orders for their products. Company A uses a flat file database with a single table named orders to record orders they receive, while Company B uses a relational database with two tables: orders and customers. • When a customer places an order with Company A, a new record (or row) in the table orders is created. Because Company A has only one table of data, all the information pertaining to that order must be put into a single record. This means that the customer's general information, such as name and address, is stored in the same record as the order information, such as product description, quantity, and price. If customers place more than one order, their general information will need to be re-entered and thus duplicated for each order they place. • Whenever there is duplicate data, as in the case above, many inconsistencies may arise when users try to query the database. Additionally, a customer's change of address would require the database manager to find all records in orders that the customer placed, and change the address data for each one. • Company B is much better off with its relational database. Each of its customers has one and only one record of general information stored in the table customers. Each customer's record is identified by a unique customer code which will serve as the relational key. When a customer orders from Company B, the record in orders need contain only a reference to the customer's code, because all of the customer's general information is already stored in customers. • This approach to entering data solves the problems of duplicate data and making changes to customer information. The database manager need change only one record in customers if someone changes addresses. • This is document ahrp in domain all.Last modified on April 24, 2006. • Indiana University, Knowledge Base http://kb.iu.edu/data/ahrp.html

  10. flat file v. relational • Single table (flat file) v multiple tables (relational)

  11. web Connection • Example: Plone Content Management System connection to a MySQL database

  12. go graphic, phpMyAdmin • A graphic interface tool for working with MySQL

  13. phpMyAdmin • GSPP and phpMyAdmin • localhost

  14. other database systems • Hadoop: distributed processing of large data sets • http://code.zynga.com/2011/06/deciding-how-to-store-billions-of-rows-per-day/ • Membase: new for games and other apps • http://www.readwriteweb.com/cloud/2010/08/membase-the-database-powering.php • CouchDB: no schema • http://couchdb.apache.org/docs/intro.html

More Related