1 / 12

source: Alex Szalay

source: Alex Szalay. Example: Sloan Digital Sky Survey. The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by querying the database , not through a zero-sum wrestling match for telescope time Managed by an RDBMS

marc
Download Presentation

source: Alex Szalay

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. source: Alex Szalay

  2. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by querying the database, not through a zero-sum wrestling match for telescope time Managed by an RDBMS (MS SQL Server), equipped with a hierarchical triangular mesh index, among other customizations 15 TB in the final release in 2007 818 GB in the RDBMS (13.6B tuples)

  3. source: Alex Szalay

  4. Drowning in data; starving for information Empirical X  Analytical X  Computational X  X-informatics Acquisition eventually outpaces analysis • Medicine: Online publishing, digital charts • Astronomy: Big telescopes (more in a bit) • Genetics: PCR, Shotgun Sequencing • Oceanography: ?? • Marine Microbiology: ?? “Increase Data Collection Exponentially in Less Time, with FlowCAM”

  5. Cyber-Observatories • Arctic Observing Network (AON) • Ocean Observing Initiative (OOI) • National Ecological Observatory Network (NEON) • The Waters Network • The Long-Term Ecological Research (LTER) network • The Geosciences Network (GEON) • Earthscope/Incorporated Research Institutions for Seismology (IRIS) • Virtual Solar-Terrestrial Observatory (VSTO) • Linked Environments for Atmospheric Discovery (LEAD)

  6. source: Alex Szalay

  7. source: Jim Gray

  8. Relational Databases (In Codd we Trust…) At IBM Almaden in 60s and 70s, Ted Codd worked out a formal basis for tabular data representation, organization, and access1. The early systems were buggy and slow (and sometimes reviled), but programmers only had to write 5% of the code the previously did. Key Idea: Programs that manipulate tabular data exhibit an algebraic structure; proposed a relational algebra to manipulate these data sets in their logical form, indpendently of their physical representation phsyical data independence logical data independence 1 E. F. Codd, “A Relational Model of Data for Large Shared Data Banks”, Communications of the ACM 13(6), pp 377-387, 1970

  9. source: Raghu Ramakrishnan

  10. Characteristicsof Cloud Computing • Virtual– Physical location and underlying infrastructure details are transparent to users • Scalable – Able to break complex workloads into pieces to be served across an incrementally expandable infrastructure • Efficient – “Services Oriented Architecture” for dynamic provisioning of shared compute resources • Flexible – Can serve a variety of workload types – both consumer and commercial

  11. Cloud Computing as Hosted Data Management Services • Yahoo • Yahoo Distributed Hash Tables: Key/value pairs • Yahoo Distributed Ordered Tables: Ordered ranges • PNUTS: Relational-style storage, indexing and query • Amazon • S3: Simple Storage • SimpleDB: Quasi-Relational features • Google • APIs for: Storage, Visualization, Document processing, Images, Mail • Microsoft: • CloudDB: Relational-style features

  12. Workflow at CMOP Washington University Cloud PNNL OHSU Cloning/ cDNA/… plates Sequencing FASTA files Inspection FASTA files e.g., trim bad reads at the end Cleaning BLAST Post processing Hit tables Link Analyze Hit tables + metadata synopsis Shared Knowledge

More Related