1 / 16

“How Perl Saved the Human Genome Project”

“How Perl Saved the Human Genome Project”. DATE: Early February, 1996 LOCATION : Cambridge, England, in the conference room of the largest DNA sequencing center in Europe.

kennan
Download Presentation

“How Perl Saved the Human Genome Project”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “How Perl Saved the Human Genome Project” • DATE: Early February, 1996 • LOCATION: Cambridge, England, in the conference room of the largest DNA sequencing center in Europe. • OCCASION: A high level meeting between the computer scientists of this center and the largest DNA sequencing center in the United States. • THE PROBLEM: Although the two centers use almost identical laboratory techniques, almost identical databases, and almost identical data analysis tools, they still can't interchange data or meaningfully compare results. • THE SOLUTION: Perl. • Lincoln Stein, TPJ Vol 1 #2 Summer 1996

  2. “How Perl Saved the Human Genome Project” Perl solved issues of: • a rapidly-changing situation • text-manipulation to convert between data formats • building pipelines to glue data analysis programs together

  3. 10 years on

  4. Obligatory tenuous coding analogy The genome is the source of a program to build and run a human

  5. Obligatory tenuous coding analogy But: the author is not available for comment

  6. Obligatory tenuous coding analogy It’s 3GB in size

  7. Obligatory tenuous coding analogy Due to constant forking, there are about 7 billion different versions

  8. Obligatory tenuous coding analogy It’s full of copy-and-paste and cruft

  9. Obligatory tenuous coding analogy And it’s completely undocumented

  10. Obligatory tenuous coding analogy Q: How do you debug it?

  11. Obligatory tenuous coding analogy A: Diff a working copy and a broken copy

  12. Same as it ever was We still have the same problems as in 1996 • a rapidly-changing situation • text-manipulation to convert between data formats • building pipelines to glue data analysis programs together

  13. A rapidly changing situation MR Stratton et al. Nature458, 719-724 (2009)

  14. Many data formats “a sea of incompatible data formats” “[for each new piece of software] you could always count on it to sport its own idiosyncratic user interface and data format. Lincoln Stein, TPJ Vol 1 #2 Summer 1996

  15. Building pipelines Sample reception Recalibration Collaborator data Library prep Data QC Library QC Mapping to reference Sequence ordering Merging libraries Sequencing Build release BAM files Tracking SNP calling Structural variants To collaborators Initial data QC Filtering Genotype check Visualization Submission to public archives Downstream analysis

  16. In conclusion • “Although it's not perfect, Perl fills the needs of the genome centers remarkably well, and is usually the first tool we turn to when we have a problem to solve.”

More Related