1 / 3

Research Meeting Summary: MapReduce Semi-Join Efficiency Exploration

This summary covers experiments on 2-Phase Semi-Join in MapReduce, test set generation, and optimization for speed with node configuration changes. Comparisons between one-shot and semi-join approaches are discussed along with iterative MR-Join cost analysis. Additional topics include graph mining system implementation and matrix operations for graph analysis. Future plans involve load balancing optimization and caching strategies.

meena
Download Presentation

Research Meeting Summary: MapReduce Semi-Join Efficiency Exploration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research Meeting 2011-04-12 JaeseokMyung

  2. Summary • 금주 진행 상황 • 2-Phase Semi-Join in MapReduce • Testsetgeneration • User(u_others, u_id) ⨝Listen(l_id, u_id, m_id) ⨝ Music(m_id, m_others) • 1k 20 20 20 20 20 1k • Bugs’ Log: |U|=17M, |M|=1M, for a day : |L|=7M (|Ud|=0.1M, |Um|=0.2M) • Local machine test ( 1/1000, 19MB ) • One-shot : Map Input 19,197,000, Map Output 187,601,000, 12355 ms • Semi-join : Map Input 19,197,000*2, Map Output (5,979,000+885,400), 4157+3054 ms • Amazon EC2 실험 준비 중 (19 GB -> ?) • one-shot vs. semi-join 비교 • semi-join 에 대해 노드 개수 변경하면서 속도 변화 확인 • 추가 최적화 : 부하 분산 / 캐싱 Center for E-Business Technology

  3. 예정사항 • KCC 2011 : Iterative MR-Join과 One-shot Join 비용 비교 • 금주 세미나 • PEGASUS: A Peta-Scale Graph Mining System – Implementation and Observations, ICDM 2009 • 연구 관련 추가 내용 • A Single-Pass MR Algorithm for the Transitive Closure and the Connected Component Problem • 그래프 분석을 위한 quadruple 기반 matrix storage 구현 • 행렬 연산 구현 : 합, 차, 곱, 역행렬, 전이행렬 등 • 행렬 * 벡터 곱 계산 • Quadruple store의 MapReduce버전 구현 Center for E-Business Technology

More Related