1 / 41

BF528 - Applications in Translational Bioinformatics

BF528 - Applications in Translational Bioinformatics. 1/23/2019. Instructor Introductions. Instructor: Adam Labadorf TAs: Emma Briars Dakota Hawkins Zhe Wang. Course Overview. Survey course in bioinformatics Focus on high-throughput sequencing data, tools, and techniques

willbanks
Download Presentation

BF528 - Applications in Translational Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BF528 - Applications in Translational Bioinformatics 1/23/2019

  2. Instructor Introductions • Instructor: Adam Labadorf • TAs: • Emma Briars • Dakota Hawkins • Zhe Wang

  3. Course Overview • Survey course in bioinformatics • Focus on high-throughput sequencing data, tools, and techniques • Focus on practical skills • Group work simulates real-world collaborative environment

  4. Course Goals • Survey current bioinformatics techniques in translational studies • Give you hands-on experience working with high-throughput biological data and tools • Read and understand papers that use bioinformatics in translational studies • Develop shared vocabulary between biology and computation

  5. Prerequisites • Molecular and cell biology • BF527, BE505/605 or equivalent • Good-to-haves: • Basic statistics knowledge • Programming/linux cluster experience • But don’t panic...

  6. Course Organization • http://bf528.readthedocs.io • Wed/Fri 2:30-4:15 STH 318 • Some online content early in semester • Online content limited to ~1 hr/class • Class period split into two segments: • Lecture or discussion of online material • Project group meeting and discussion

  7. Course Organization cont’d • Students assigned into groups of 4 • Each group has a primary TA • 4 projects over the course of the semester • The last project is an individual project • No homeworks • No exams

  8. Schedule of Topics

  9. Schedule of Topics cont’d

  10. Projects • Assigned into groups based on experience • Groups are for the entire semester • You will reproduce published findings from published manuscripts • Each project has a full writeup

  11. Project Groups • Group members will play one of four roles: • Data Curator - find, download, and organize data • Programmer - process data into analyzable form • Analyst - transform processed data into interpretable form • Biologist - understand paper and biological context, help interpret results • Roles rotate for each project • Structured class time to help facilitate group work and help each other!

  12. Project Group Meeting : Wednesdays • Time allotted for groups to meet and discuss progress • “Stand-up” meeting structure: • “What did I work on since our last meeting?” • “What challenges did I encounter?” • “Are there any obstacles to completing my work?” • “What will I be working on for next meeting?” • Each group will make a brief status report at the end of class

  13. Project Group Meeting : Fridays • Time allotted for roles to meet and discuss progress • Similar structure to Wednesdays • Share challenges and solutions among roles • Each role group will make a brief status report at the end of class

  14. Project Report • Organized like a published study • Sections (primary role): • Intro - background and motivation (Biologist) • Data - data description (Data Curator) • Methods - processing and tools (Programmer) • Results - findings (Analyst) • Discussion - interpret findings (Biologist) • Conclusion (all)

  15. Assessment • Each project is 25% of your total grade • Broken down: • Intro, Conclusion - 2.5% • Data, Methods, Results, Discussion, 20% • Stand-up participation: 15%

  16. Translational Bioinformatics

  17. Biology as Data Science • 1953 - DNA structure published in Nature • 1972 - first genetic sequence determined, protein DataBank • 1977 - Sanger sequencing, first genome sequenced • 1983 - PCR technique invented • 1990 - Human Genome Project begun • 1995 - first bacterial genome sequenced, microarray technology first described • 1997 - yeast genome on a microarray, sequencing by synthesis concept established • 1998 - first multicellular eukaryote sequenced • 2001 - first draft of human genome • 2006 - Solexa Genome Analyzer released

  18. “Big” Data • Single Microarray dataset: ~500Mb • Single short read dataset: ~2Gb-300Gb • Human genome reference sequence: ~2Gb • One run of Illumina instruments: • HiSeq 2500: ~1Tb • NovaSeq 6000: ~6Tb • Gene Expression Omnibus (GEO): • 2014: 1,237,138 samples, ~28 Tb • 2018: 2,335,694 samples, ?? Tb

  19. What is Bioinformatics? “Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data. As an interdisciplinary field of science, bioinformatics combines computer science, statistics, mathematics, and engineering to study and process biological data.” Wikipedia

  20. Conceptual History of Bioinformatics • Biological sequences digitized • Biological databases needed to store sequences • Search tools needed for databases • Tools for analyzing data from searches • Computational tools required to analyze human genome • Sophisticated sequence analysis tools enable analysis of large amounts of sequencing data • Sequencing data volume explodes, requiring new tools • And here we are

  21. The Biologist’s Tools Wet lab biologists: Bioinformaticians:

  22. Sequence: The Fundamental Datatype Sequence • Computer Science • genome assembly, homology, phylogeny • Physics • DNA/RNA/protein structure, drug prediction • Statistics • gene expression, population genetics, biomarkers • Mathematics • metabolic modeling, synthetic biology, systems biology

  23. Genbank Sequences

  24. Translational Bioinformatics “Translational Bioinformatics is an emerging field in the study of health informatics,focused on the convergence of molecular bioinformatics, biostatistics, statistical genetics, and clinical informatics.” Wikipedia

  25. For Next Time Assignment: familiarize yourself with the material on basic command line usage found here: Workshop 0. Basic Linux and Command Line Usage

  26. SSH and SCC • SCC - Shared Compute Cluster • You all have accounts on SCC • You will need an ssh client program to connect: • Mac, Linux: Terminal (included) • Windows: MobaXTerm • Connect to: scc1.bu.edu with your BU username/password Demonstration

  27. 2019 Survey Results

  28. Programming/Computer Skills

  29. Statistics

  30. Biology

  31. Bioinformatics

  32. Rank the following roles that you might play in a project in order of preference

  33. Rank the following roles that you might play in a project in order of preference 2019 2018

  34. 2018 Survey Results

  35. How comfortable are you with the following programming languages/concepts?

  36. How comfortable are you with the following statistics concepts?

  37. How comfortable are you with the following biology concepts?

  38. How comfortable are you with the following bioinformatics concepts?

  39. Rank the following roles that you might play in a project in order of preference

  40. What do you hope to learn?

  41. How do you plan to use what you learn?

More Related