1 / 16

YZStack: Provisioning Customizable Solution for Big Data

YZStack: Provisioning Customizable Solution for Big Data. Sai Wu, Chun Chen, Gang Chen, Lidan Shou, Ke Chen Zhejiang University. Hui Cao, yzBigData Co. Lte. He Bai, City Cloud Technology. 3H Problem in Deploying the Big Data System.

Download Presentation

YZStack: Provisioning Customizable Solution for Big Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. YZStack: Provisioning Customizable Solution for Big Data Sai Wu, Chun Chen, Gang Chen, Lidan Shou, Ke Chen Zhejiang University Hui Cao, yzBigData Co. Lte. He Bai, City Cloud Technology

  2. 3H Problem in Deploying the Big Data System • How can I build and deploy a big data system without back-ground knowledge? • How can I migrate existing applications to the big data system? • How can I use my big data system to do the analysis job?

  3. Too Many Choices • Visualization : • Openstack • Cloudstack • Vmware • Cloud storage: • key-value store (hbase, cassandra, redis,…) • relational service (AWS, spanner,…) • Processing engine: • MapReduce/Hadoop • Dryad • Pregel, GraphLab • Spark • epiC • Application service: • Mahout • Hive • Spatial Hadoop

  4. Can I Deploy a Big Data System Like Installing a Windows Software? • Configure the installation as a customization process • The installation software will copy the binary codes to all servers and do the configuration automatically • A browser-based management system to start/stop the services and monitor the status

  5. YZStack: the Architecture • Layers are loosely connected • Each layer includes many selectable modules • Modules of different layers are linked via the common interfaces • Optimizations are implemented as special plugins

  6. Features of YZStack • Adaptive Image • Based on openstack, partition the big image into small chunks • Different images share the same chunk • Optimization Plugins • Column-oriented plugin • Index plugin • Query optimization plugin • Iterative job plugin • Visualization Tool • Zoom in/out for different dimensions

  7. Optimization Plugin

  8. Use Case: the Smart Financial System • Built for the Zhejiang Provincial Department of Finance (ZPDF)

  9. Economic Prediction • Collaborate with researchers from college of economics, Zhejiang University • Step 1: • Use the OLAP module to provide a basic view for each registered company

  10. Economic Prediction (cont.) • Step 2: • Healthy Model: Based on the historical data, the healthy model discovers risks and predicts prospects of an industry • Energy Consumption Model: We link the financial data with the electronic, water, and environment data to rank each industry based on its energy consumption per unit of output value. • Economic Impact: Model By connecting the financial data to the human resource data, we study how many workers are employed for an industry and their average salary • Combine all three models to rank all industries accordingly

  11. Economic Prediction (cont.) • Step 3: Index of Economic (ongoing work) • To predict the status of the whole Zhejiang Province using statistics generated by previous two steps • Involving multiple complex economic models • Our economic researchers are using the visualization tools to build and study their models

  12. Detection of Improper Payment • What is the improper payment? • A person is classified as the low-income type and buys a house specially for low-and-medium wage earners. However, he is actually employed by IT company • One company may submit different registration files to different government departments (e.g., it registers as a high-tech company in the Department of Science, but as a labor-intensive one in the Department of Labor) to enjoy various allowances from the government.

  13. Why ZPDF? • A harbor of financial data in Zhejiang Province • Electronic department • Traffic department • Tax department • … • It is well motivated • Expected to save more than 1 billion CNYs

  14. Improper Payment • Step 1 (Consistent Problem): • To detect improper payment from two databases, D0 and D1, • we first generate two star-join queries, Q0 and Q1, which selectively merge the fact tables with the dimension tables. • The trick is that the entities returned by Q0 should not exist in the results of Q1. • E.g., Q0 returns the high-income persons, while Q1 returns the users who own a house specially for low-and-medium wage earners.

  15. Consistent Problem • we apply the LSH (Locality Sensitive Hashing) to generate k hash values for each tuple from T0 and T1. • So the tuples sharing the same hash value are considered as a candidate group. • We define a similarity function sim(ti; tj) to evaluate the probability of two tuples representing the same entity. If sim(ti; tj) is greater than a predefined threshold, it will be forwarded to the verification module where a human-aided algorithm is applied to filter out the false positives.

  16. Conclusion • YZStack is tailored for the users who have little or no experience in deploying and maintaining the cloud system. • It simplifies the development of a new big data application as the process of module selection and customization. • To show the flexibility and usability of YZStack, we demonstrate how we build a smart financial system for the Zhejiang Provincial Department of Finance using YZStack.

More Related