30 likes | 119 Views
This summary covers experiments on 2-Phase Semi-Join in MapReduce, test set generation, and optimization for speed with node configuration changes. Comparisons between one-shot and semi-join approaches are discussed along with iterative MR-Join cost analysis. Additional topics include graph mining system implementation and matrix operations for graph analysis. Future plans involve load balancing optimization and caching strategies.
E N D
Research Meeting 2011-04-12 JaeseokMyung
Summary • 금주 진행 상황 • 2-Phase Semi-Join in MapReduce • Testsetgeneration • User(u_others, u_id) ⨝Listen(l_id, u_id, m_id) ⨝ Music(m_id, m_others) • 1k 20 20 20 20 20 1k • Bugs’ Log: |U|=17M, |M|=1M, for a day : |L|=7M (|Ud|=0.1M, |Um|=0.2M) • Local machine test ( 1/1000, 19MB ) • One-shot : Map Input 19,197,000, Map Output 187,601,000, 12355 ms • Semi-join : Map Input 19,197,000*2, Map Output (5,979,000+885,400), 4157+3054 ms • Amazon EC2 실험 준비 중 (19 GB -> ?) • one-shot vs. semi-join 비교 • semi-join 에 대해 노드 개수 변경하면서 속도 변화 확인 • 추가 최적화 : 부하 분산 / 캐싱 Center for E-Business Technology
예정사항 • KCC 2011 : Iterative MR-Join과 One-shot Join 비용 비교 • 금주 세미나 • PEGASUS: A Peta-Scale Graph Mining System – Implementation and Observations, ICDM 2009 • 연구 관련 추가 내용 • A Single-Pass MR Algorithm for the Transitive Closure and the Connected Component Problem • 그래프 분석을 위한 quadruple 기반 matrix storage 구현 • 행렬 연산 구현 : 합, 차, 곱, 역행렬, 전이행렬 등 • 행렬 * 벡터 곱 계산 • Quadruple store의 MapReduce버전 구현 Center for E-Business Technology