bioinformatics 2 n.
Skip this Video
Loading SlideShow in 5 Seconds..
BioInformatics (2) PowerPoint Presentation
Download Presentation
BioInformatics (2)

Loading in 2 Seconds...

play fullscreen
1 / 13

BioInformatics (2) - PowerPoint PPT Presentation

  • Uploaded on

BioInformatics (2). Physical Mapping - I. Low resolution Megabase-scale High resolution Kilobase-scale or better Methods for low resolution mapping Somatic cell hybrids (human and mouse or hamster) Fast chromosomal localisation of genes Subchromosomal mapping possible

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'BioInformatics (2)' - fern

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
physical mapping i
Physical Mapping - I
  • Low resolution
    • Megabase-scale
  • High resolution
    • Kilobase-scale or better
  • Methods for low resolution mapping
    • Somatic cell hybrids (human and mouse or hamster)
      • Fast chromosomal localisation of genes
      • Subchromosomal mapping possible
    • Fluorescence in situ hybridisation (FISH)
    • Chromosome painting
    • Fractionation of chromosomes by flow cytometry
physical mapping ii
Physical Mapping - II
  • Methods for high resolution mapping
    • Long-range restriction mapping
    • Pulsed-field gel electrophoresis (PFGE)
    • Assembly of clone contigs
    • The double digest problem
      • Ordering fragments from a 2 restriction enzyme digest
    • Sequence Tagged Sites (STSs)
      • Sequence fragments in the genome described uniquely

by a pair of PCR primers

      • Usually 200-300 bases
      • Very useful as ‘landmarks’ on the physical map
      • Can be mapped to individual clones by FISH
    • Assembly of STS-content physical maps
physical mapping iii
Physical Mapping - III
  • Map units (human genome)
    • 1 cM = ~ 1 Mb
    • 1 cR = ~ 30 kb
      • 1 centiRay = 1% chance of a radiation-induced break between 2 markers
    • Major information resources
      • Stanford Human Genome Center (RH maps)
      • Whitehead/MIT Genome Center (STS content maps)
      • Centre d’Etude du Polymorphisme Humaine - CEPH (YAC maps)
physical mapping iv
Physical Mapping - IV
  • Conclusions
    • The value of physical mapping
      • Confirmation of chromosomal location of clones and genes
      • Correction of genetic map errors
      • Correlation to genetic map reveals ‘hot’and ‘cold’ regions of recombinational activity on chromosomes
      • Provides useful information for duplicated regions
      • High resolution mapping provides the framework necessary for high quality sequencing of large genomic regions
dna sequencing
DNA Sequencing
  • Ordered clone library
    • Sequencing of overlapping clones of known order as determined by restriction analysis
    • Advantage
      • Easy ordering of resulting sequence reads
    • Disadvantage
      • Detailed mapping is time-consuming
  • Shotgun sequencing
    • Partial digestion of DNA with a 4-cuter enzyme
    • Sequencing of randomly overlapping clones
    • Computer-aided assembly of reads
    • Advantage
      • Speed
      • Disadvantage
      • High data redundancy due to random sequencing
      • Not suitable for large genomes (>300 Mb)
assembly of sequence contigs
Assembly of Sequence Contigs
  • The problem:
    • Semi-automated assembly of a contiguous DNA sequence from overlapping gel readings
  • Steps
    • Base identification
    • Trimming of ends
    • Vector clipping
    • Assembly of fragments
  • Major software packages
    • SequencherTM from GeneCodes Inc., Ann Arbor, Michigan
      • Platforms: PowerMac, Windows NT
      • Up to 70 kb contigs
    • The Staden package by Staden et al., MRC, Cambridge
    • PHRED/PHRAP by Green et al., University of Washington, Seattle
      • Platforms: Unix
      • Megabase range contigs
      • Mutation detection capabilities
quality control of sequence data source us doe joint genome institute
Quality Control of Sequence DataSource: US DOE Joint Genome Institute
  • Goals
    • Complete sequence continuity across a target region (both within and between clones)
      • No more than one gap in 200 kb
      • Size of all gaps no larger than 1% of the size of the total region
    • ‘Allowable gaps’ include
      • regions unclonable/unstable in conventional cloning vectors
      • repetitive regions
      • regions with significant secondary structure or abnormally high GC content
      • Gap size measured by PCR or restriction digest analysis
    • Accuracy of finished sequence: 1 error in 10,000 bases
      • At least 95% double-strand coverage
    • Assembly Verification
      • a minimum of three independent restriction digests
      • reassembly with an independent algorithm
      • re-sequencing of random clones
submission and annotation of sequence data source us doe joint genome institute
Submission and Annotation of Sequence DataSource: US DOE Joint Genome Institute
  • Size of the starting clone is minimum size of submission to public databases
    • 95% of the sequence represented on both strands
    • all ambiguities resolved or annotated
    • missing data from the end of a clone allowed if sequence overlap is detected with the adjacent clone in the tiling path
  • Level of annotation
    • all sequences annotated in a largely automated fashion
    • identification of putative or known genes, repetitive elements, EST matches and any other useful “miscellaneous features”
    • computationally-derived predictions must be indicated as such
  • Immediate release of finished annotated sequence
    • Global assembly of meta-contigs from previously submitted data will be performed periodically
International Strategy Meeting on Human Genome SequencingBermuda, 25th-28th February 1996Sponsored by the Wellcome Trust
  • Summary of agreed principles
    • Primary genomic sequence should be in the public domain
    • Primary genomic sequence should be rapidly released
    • Assemblies of greater than 1 Kb should be automatically released on a daily basis
    • Finished annotated sequence should be immediately submitted to the public databases
  • Coordination
    • Large-scale sequencing centres should inform HUGO of their intention to sequence particular regions of the human genome
annotating the human genome sequence
Annotating the Human Genome Sequence
  • Identification of coding regions
    • Exon/intron prediction
  • High throughput comparison of genomic sequence to protein information
    • Full-length protein sequences
    • Databases of protein domains
  • How automated is automated annotation in reality?
    • Advantages
      • High speed
      • Good for tRNA genes, repetitive regions
      • Good for high-scoring matches in databases, but
    • Disadvantages
      • Error propagation can be detrimental
      • Domain ‘recycling’ in evolution causes misinterpretation, e.g. in the case of transcription factors similar to peptidases
  • Very computer-intensive task!