1 / 25

最专业 的移动应用统计分析和开发者服务平台

最专业 的移动应用统计分析和开发者服务平台. 王春国 email: wangchunguo@umeng.com wechat / qq : 715356603. agenda. Mobile Big Data Tech stack Real time Dataflow Hadoop architect Data Warehouse Sloutions Q&A . Mobile Big Data. Mobile Data Features. Diversity Fragmentation M ulti - dimensional

stacie
Download Presentation

最专业 的移动应用统计分析和开发者服务平台

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 最专业的移动应用统计分析和开发者服务平台 王春国 email: wangchunguo@umeng.com wechat/qq : 715356603

  2. agenda • Mobile Big Data • Tech stack • Real time Dataflow • Hadoop architect • Data Warehouse • Sloutions • Q&A

  3. Mobile Big Data

  4. Mobile Data Features • Diversity • Fragmentation • Multi-dimensional • Frequently • High-speed growth • Low quality

  5. 10+ billion installation • ~3+ billion request、max 60000/s • ~5TB + day • ~1000 nodes • 2 – 2.5 billion message • 500+ job • 16 thousands + App • 65 thousands+ developer

  6. Tech Stack

  7. Java、Scala、Python、Shell、C … • Kfaka、Storm • Hive 、Pig • Mapreduce • Redis、MongoDB、HBase • Excel、R • Finagle • Git

  8. Data Collection

  9. Architect

  10. Real Time Data Flow

  11. Batch Mode

  12. Data Warehouse

  13. solutions

  14. Protobuf • Serializing structured data – think XML • Flexible ,Efficient , Simple • Development language independence • More smaller • More faster • Format Simpler • Less ambiguous

  15. Hive ORCFile Features • Reduces the NameNode'sload • light-weight indexes -skip row groups -seek to a given row • block-mode compression • bound the amount of memory needed for reading or writing • metadata stored using Protocol Buffers

  16. Hive ORCFileStrutcture

  17. HQL: SELECT COUNT(1) FROM TABLE(ORCFilevsTextFile)

  18. LZMA Compress • More faster compression speed • More faster decompression speed • More Smaller memory requirements decompression • More Smaller code size for decompression

  19. gzvslzovslzma

  20. Blend Scheduler • Fair Scheduler • Map Slot <-> Reduce Slot • More efficient • Full use of cluster resources

  21. Data Skew Row Key design by date+appkey  Row Key design by md5(date+app_key)[0:4] +date+appkey

  22. Bulk Load MapReduce -> put HBase Table  HDFS -> HFile -> Table 4min 10s

  23. Welcome to Umeng!

More Related