1 / 10

Research Meeting

Research Meeting. 2009-12-28 Jaeseok Myung. Summary. 수업 ( 성적입력 ) 학부생졸업논문 ( 이승재 , 김홍찬 ) 서울대 멘토링 진행중 Research SPARQL BGP Processing with Iterative MR Implementation: Hbase WAIM 2010(1/29), VLDB 2010(3/9) How MR works for triples? Why do we need iterative MRs?. Outline.

kadeem
Download Presentation

Research Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research Meeting 2009-12-28 JaeseokMyung

  2. Summary • 수업(성적입력) • 학부생졸업논문(이승재, 김홍찬) • 서울대 멘토링 진행중 • Research • SPARQL BGP Processing with Iterative MR • Implementation: Hbase • WAIM 2010(1/29), VLDB 2010(3/9) • How MR works for triples? • Why do we need iterative MRs? Center for E-Business Technology

  3. Outline • Introduction • Related Work • BGP Processing with MR • MR Iteration (Join시 MR iteration 발생이유, N-Triple 저장 구조) • Naïve Approach (Single-Random) • Our Approach • Multi-Greedy Algorithm • Discussion (edge preserving, type별 performance, key selection) • Experiments • Environmental Settings (Hadoop, LUBM, Complex Query, Amazon EC2, Converter) • SPARQL Processing Results (node개수 변화, 데이터 size 변화) • Dealing with Intermediate Result (중간의 파일 IO 비용 크다, CGL-MR, MR-Online) • Conclusion (N-Triple보다 복잡한, 압축가능한 저장 구조 및 인덱싱 연구 필요) • Reference Center for E-Business Technology

  4. MapReduce 한재선, SearchDay2008, http://nexr.tistory.com Center for E-Business Technology

  5. How MR works fortriples? (1/2) SELECT ?a ?b WHERE { ?a dbpedia:spouse ?b. ?a dbpedia:wikilinkdbpediares:actor. ?b dbpedia:wikilinkdbpediares:actor. ?a dbpedia:placeOfBirth ?c. ?b dbpedia:placeOfBirth ?c } Actors who are married to each other and born in the same place 1 2 3 4 5 2 4 1 3 5 a1 (1), (2), (4) … a1 a1 a1 a1 b1 a1 b1 a1 a1 b1 b1 a1 place spouse spouse link link place place link spouse place link place b1 c1 c1 actor c1 c1 actor b1 actor actor c1 b1 Mapper … b1 (1), (3), (5) c1 … (4), (5) … Center for E-Business Technology

  6. How MR works for triples? (2/2) SELECT ?a ?b WHERE { ?a dbpedia:spouse ?b. ?a dbpedia:wikilinkdbpediares:actor. ?b dbpedia:wikilinkdbpediares:actor. ?a dbpedia:placeOfBirth ?c. ?b dbpedia:placeOfBirth ?c } Actors who are married to each other and born in the same place 1 2 3 4 5 2 4 1 3 5 a1 a1 spouse b1 (1, 2, 4) link actor … b1 b1 a1 a1 a1 place spouse place link link b1 actor c1 actor c1 Reducer place c1 b1 a1 spouse b1 link actor … (1, 3, 5) c1 a1 place c1 … (4, 5) b1 place … Center for E-Business Technology

  7. Why do we need iterative MR? SELECT ?a ?b WHERE { ?a dbpedia:spouse ?b. ?a dbpedia:wikilinkdbpediares:actor. ?b dbpedia:wikilinkdbpediares:actor. ?a dbpedia:placeOfBirth ?c. ?b dbpedia:placeOfBirth ?c } Actors who are married to each other and born in the same place a|c a 1 2 3 4 5 2 4 a|b 1 b b|c 3 5 a1 a1 spouse b1 (1, 2, 4) link actor … a1 a1 b1 a1 b1 place link spouse link place actor c1 actor b1 c1 place c1 b1 (1, 3, 5) a1 spouse b1 link actor … (4, 5) c1 a1 place c1 … b1 place … … Center for E-Business Technology

  8. Why do we need iterative MR? SELECT ?a ?b WHERE { ?a dbpedia:spouse ?b. ?a dbpedia:wikilinkdbpediares:actor. ?b dbpedia:wikilinkdbpediares:actor. ?a dbpedia:placeOfBirth ?c. ?b dbpedia:placeOfBirth ?c } Actors who are married to each other and born in the same place a|c a 1 2 3 4 5 2 4 a|b 1 b b|c 3 5 a|b b|c a|d 3 1 2 a|c 2 a|b b|c c|d a|b 4 a|e 1 1 2 3 6 a|b b|c c|d d|e a|g 5 a|f 1 2 3 4 (b) (c) (d) … (a) Center for E-Business Technology

  9. Naïve vs. Our Approach • 정리 진행중 Center for E-Business Technology

  10. Outline • Introduction • Related Work • Preliminaries • BGP Processing with MR • MR Iteration (Join시 MR iteration 발생이유, N-Triple 저장 구조) • Naïve Approach (Single-Random) • Our Approach • Multi-Greedy Algorithm • Improvement • Using Advanced Storage for Selection Task • Using Selectivity Info. for Minimizing BGP Iteration • Discussion (edge preserving, type별 performance, key selection) • Experiments • Environmental Settings (Hadoop, LUBM, Complex Query, Amazon EC2, Converter) • SPARQL Processing Results (node개수 변화, 데이터 size 변화) • Dealing with Intermediate Result (중간의 파일 IO 비용 크다, CGL-MR, MR-Online) • Conclusion (N-Triple보다 복잡한, 압축가능한 저장 구조 및 인덱싱 연구 필요) • Reference Center for E-Business Technology

More Related