140 likes | 512 Views
Pindel user manual. Kai Ye k.ye@lumc.nl. Preparation of Pindel input. Alignment BAM file generated by BWA. Alignment BAM file generated by other aligners. bam2pindel.pl Adaptor.pm. (2) sam2pindel.cpp. Pindel input with sample tag. (3) FilterPindelReads.cpp.
E N D
Pindel user manual Kai Ye k.ye@lumc.nl
Preparation of Pindel input Alignment BAM file generated by BWA Alignment BAM file generated by other aligners • bam2pindel.pl • Adaptor.pm (2) sam2pindel.cpp Pindel input with sample tag (3) FilterPindelReads.cpp Filtered Pindel input with sample tag Merge Pindel input files for paired or population sequence data
(1) bam2pindel.pl • Written by Keiran Raine at Sanger Institute (kr2@sanger.ac.uk) • This tool was designed for BWA based BAM/SAM Illumina data • You must prepare a name sorted bam file • Set BAM_2_PINDEL_ADAPT setenv BAM_2_PINDEL_ADAPT <path to Adaptor.pm> • Arguments: -i|input: Input BAM file (req) -o|output: Output ready for pindel -s|sample: Sample or label (sampA,sampB...) (req) -pi|insert: Required if BAM file does not have PI tag in header RG record -r|restrict: Restrict to chromosome xx • Example: ./bam2pindel_bwa.pl –i NameSorted.bam –o output_prefix -s tumour –om –pi 300
(2) sam2pindel.cpp • Written by Kai Ye at Leiden University Medical Center (k.ye@lumc.nl) • This tool was designed for all BAM/SAM Illumina data • You must first compile the cpp source code: g++ sam2pindel.cpp –o sam2pindel –O3 • 5 arguments are required by sam2pindel • 1. Input sam file. • 2. Output for pindel. • 3. insert size. • 4. tag. • 5. number of extra lines (not start with @) in the beginning of the file. • If you start with standard sam file (Input.sam with insert size 300) ./sam2pindel Input.sam Output4Pindel.txt 300 tumour 0 • If you start with bam file ./samtools view Input.bam | ./sam2pindel - Output4Pindel.txt 300 tumour 0
Running Pindel 1. Input: the reference genome sequences in fasta format; 2. Input: the unmapped reads in a modified fastq format; 3. Output folder 4. Which chr/fragment 5. BreakDancer result: Format per line: ChrALocAstringAChrBLocBstringB others If you don't have BreakDancer result, please provide an empty file here. Example: ./pindel hg19.fa pindel_input_chr1.txt Output_Folder chr1 empty
Input format of Pindel @9113 TGGGGACCGGTGGAATGCTTCCACTGGCTGGGGGGC + chr2 41149518 50 Tumor Strand, chr, 3’ coordinate and mapping quality of the mapped reads; sample tag ref Anchor
Output format: deletions D 321 ChrID 0 56173880 56174202 Supports: 15 70 130.916 TAAGAATGAGTTGGCAAATAAAGAGTTTGGTGAGTTTATAGAAATATAGGggccg<311>ataggACAAGGTACAAGGAATGGCTGAAGGAGAGAGGTTG GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG + 56173670 normal GTGAGTTTATAGAAATATAGG ACAAGGTACAAGGAA + 56173677 normal GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG + 56173681 normal TGGTGAGTTTATAGAAATATAGG ACAAGGTACAAGG + 56173687 normal GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG - 56173690 normal GTGAGTTTATAGAAATATAGG ACAAGGTACAAGGAA - 56173695 normal AGTTTGGTGAGTTTATAGAAATATAGG ACAAGGTACAAGGA - 56173697 normal GTGAGTTTATAGAAATATAGG ACAAGGTACAAGGAA + 56173700 tumor AGTTTATAGAAATATAGG ACAAGGTACAAGGAATGG + 56173710 tumor TTTGGTGAGTTTATAGAAATATAGG ACAAGGTACAA + 56174339 tumor TGAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG + 56174356 tumor TGAGTTTATAGAAATATAGG ACAAGGTACAAGGAAT - 56174357 tumor GTTTATAGAAATATAGG ACAAGGTACAAGGAATGGC - 56174358 tumor GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG - 56174365 tumor AGTTTATAGAAATATAGG ACAAGGTACAAGGAATGG - 56174373 tumor 1base - 1million bases
Allow mismatches to accommodate sequence errors and SNPs D 10 ChrID 13 BP 32913041 32913052 AAATCAACTAGTGACCTTCCAGGGACAACCCGAACGTGATGAAAAGATCAaagaacctacTCTATTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTTGGACAAAGT GATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTTGGACAA CAACCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGA CGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTTGGA CGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTTGGA TGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTTGGACAAAG GTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTTGGAC TAGTGACCTTCCAGGGACAACCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAA CCTTCCAGGGACAACCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAA ACAACCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGG CGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTT CCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATC AACCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAA TGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTTGGACA ACCTTCCAGGGACAACCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAA GATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTTGGACAA AACCCGAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAA GAACGTGATGAAAAGATCA TCTGTTGGGTTTTCATACAGCTAGCGGGAAAAAAGTTAAAATTGCAAAGGAATCTTT
Inversions sample ref
Non-template sequence in deletions, inversions and tandem duplications ref sample
Non-template sequence: deletion of 4 bases with 2 bases inserted D 4 I 2 ChrID 3 BP 156978978 156978983 Supports 12 + 0 - 12 S1 13 SUM_MS 627 NumSupSamples 1 HCC1599a 12 CATGGCTGACTTATAAATCCCTACAGATATGTGGTTACTTCTCTACTTTCCCTTTCTTTGGCTTGGGCAACTGCCACGTTGATGCACTGGAGCCATTCTTCTGCATTCTTCTCATCCTTGGCCTTAAAGACATAGGTTTTATTGTC TTATAAATCCCTACAGATATGTGGTTACTTCTCTACTTTCCCTTTCTTTGCCTTGGGCAACTGCCAAA GATGCACT ATGTGGTTACTTCTCTACTTTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCAT CTCTACTTTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCATTCTTCTCATCCT AGATATGTGGTTACTTCTCTACTTTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCT TTTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCATTCTTCTCATCCTTGGCCT TTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCATTCTTCTCATCCTTGGCCTT TTACTTCTCTACTTTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCATTCTTCT CTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCATTCTTCTCATCCTTGGCCTTAAAGACATAGGTTT CTACAGATATGTGGTTACTTCTCTACTTTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTC AAATCCCTACAGATATGTGGTTACTTCTCTACTTTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAG CTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCATTCTTCTCATCCTTGGCCTTAAAGACATAGGTTT TTCCCTTTCTTTGGCTTGGGCAACTGCCAAA GATGCACTGGAGCCATTCTTCTGCATTCTTCTCATCCTTGGCCTT