1 / 7

Regression Analysis

Regression Analysis. DataSet Data Preprocess Normalize LARS. DataSet. X the SNP sequence of 163 subjects each sequence has 5222888 SNPs Y the Wolbachia infected tables. Preprocess of X. As the email said, get an data array of 0,1,0.5 and N

linus
Download Presentation

Regression Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression Analysis • DataSet • Data Preprocess • Normalize • LARS

  2. DataSet • X the SNP sequence of 163 subjects each sequence has 5222888 SNPs • Y the Wolbachia infected tables

  3. Preprocess of X • As the email said, get an data array of 0,1,0.5 and N • Set the values:0->0; 0.5->1; 1->2; N->1; • Get the file new X(DataSet) on http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/x.rar

  4. Preprocess of Y • Choose the sheet of Wolbachia status • Set Values: y->1 n->0 (as they will be normalized, so we get the same results when y->2 n->0) • Get y here: • http://gdm.fudan.edu.cn/attach/lasso_on_GU/y.txt

  5. Normalize X and Y • Use multithread algorithm(2048 threads) to get normalized X (bigger than 8G) • Normalized Y • Normalized X and Y are packaged here: http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/normalize.rar

  6. LARS • Use LARS for 163 iterations • Get the result as each line contains: The max angle between the remaining error and 5222888 vectors In which SNP we get the max angle in some iteration. Here is the result: http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/result.txt

  7. Findings • Are SNP's importance concerned with how many 0s it contains? • As the result file:http://gdm.fudan.edu.cn/attach/Lasso_on_GPU/rstAnd0s.txt • Showes: NO! • Means The Reference Sequence is not reliable.

More Related