linus
Uploaded by
7 SLIDES
208 VIEWS
70LIKES

Utilizing LARS for Regression Analysis on SNP Data and Wolbachia Infection Preprocessing

DESCRIPTION

In this project, we perform regression analysis using the LARS (Least Angle Regression) algorithm on SNP data from 163 subjects, each containing 5,222,888 SNPs. We preprocess the data, normalizing both the SNP sequences and Wolbachia infection statuses. Our normalization involves converting values (0, 0.5, 1, N) into a standardized format followed by the use of a multi-threading algorithm for efficiency. The findings reveal that SNP importance is not significantly influenced by the number of zeros present, indicating the reference sequence may be unreliable.

1 / 7

Download Presentation

Utilizing LARS for Regression Analysis on SNP Data and Wolbachia Infection Preprocessing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression Analysis • DataSet • Data Preprocess • Normalize • LARS

  2. DataSet • X the SNP sequence of 163 subjects each sequence has 5222888 SNPs • Y the Wolbachia infected tables

  3. Preprocess of X • As the email said, get an data array of 0,1,0.5 and N • Set the values:0->0; 0.5->1; 1->2; N->1; • Get the file new X(DataSet) on http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/x.rar

  4. Preprocess of Y • Choose the sheet of Wolbachia status • Set Values: y->1 n->0 (as they will be normalized, so we get the same results when y->2 n->0) • Get y here: • http://gdm.fudan.edu.cn/attach/lasso_on_GU/y.txt

  5. Normalize X and Y • Use multithread algorithm(2048 threads) to get normalized X (bigger than 8G) • Normalized Y • Normalized X and Y are packaged here: http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/normalize.rar

  6. LARS • Use LARS for 163 iterations • Get the result as each line contains: The max angle between the remaining error and 5222888 vectors In which SNP we get the max angle in some iteration. Here is the result: http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/result.txt

  7. Findings • Are SNP's importance concerned with how many 0s it contains? • As the result file:http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/rstAnd0s.txt • Showes: NO! • Means The Reference Sequence is not reliable.

More Related