The Assignment for Programming T raining . Fu Yu. Usage. “python mapping_report INPUT_FILE”. The INPUT_FILE is the SAM file that you want to process.
What’s more, my script deals with a SAM file that is more than 2G in only 356.87s
In about 18s, the script finishes dealing with a SAM file that is approximately 160megebytes. And in the current directory there is a file named “gross result” that includes all the result. And there is a “Distribution_of_scores.pdf” in which you can find the quality score.
To get everything that the header contains. Besides, it handles possible exceptions in case the SAM file is corrupted.
Put the score of each tag in to the “f_out_quality_score”, thus I can use rscript to deal with the score and draw the distribution.
Here, this Python script creates an R script and call it in the terminal so that we do not have to run the rscript by ourselves.
They share the same loop because they use identical loop. This way, I can improve the efficiency of the script.
Use the XA field to decide how many genomic locations there are and what are the exact place the tags are back.
If a line has ‘0’ or ’16’, together with the 19th field, then it is a tag that fulfills the condition given. Count the number and we get the result.