1 / 14

Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three Deployed Solutions

Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three Deployed Solutions Tao Zhong K. Doshi Xi Tang Ting Lou Zhongyan Lu Hong Li Software and Services Group, Intel. Statement of faith:

ajay
Download Presentation

Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three Deployed Solutions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three Deployed Solutions Tao Zhong K. Doshi Xi Tang Ting Lou Zhongyan Lu Hong Li Software and Services Group, Intel

  2. Statement of faith: Real time (low latency) analytics will become more important to end users – if not for all queries, for a non-trivial fraction of queries.

  3. We walk through three workload scenarios in this short presentation. Objective- Generate ideas for workloads that reflect low latency and high throughput demands simultaneously. All three use cases described here are in deployment or in pre-deployment testing among Intel partners in PRC.

  4. Smart City Application: Detect and Prevent License Plate Fraud

  5. CAPTURE EXTRACT STORE COMPUTE Registration and Traffic History Records RDBMS

  6. SMART CITY Workload Solution Flow Integrate D File System Registration Records C Retrieve 3 4 Merge Evolve Detect Query 2 E Persist B F Real-time Analytics Notify Feed Extraction System Enforcement A 1 5

  7. SMART CITY Workload Characteristics Integrate D File System Registration Records C Retrieve 3 4 • Structured and unstructured data, • Transactional and analytic activities, • Scale out in-memory processing combined with distributed persistent data stores • Real-time and batch operations, and • Information inflows from sensor and non-sensor devices Merge Evolve Detect Query 2 E Persist B F Real-time Analytics Notify Feed Extraction System Enforcement A 1 5 • Scale out in-memory processing combined with distributed persistent data stores • Structured and unstructured data • Information inflows from sensor and non-sensor devices • Transactional and analytic activities • Real-time and batch operations

  8. 2. Content Management and Integration

  9. Rapid Content Management -- Solution Flow Data Analysis Logic Search New Media New Media New Media New Media New Media Information Accumulation over time Information Accumulation over time Hibernate Driver HBase Driver Hive Dialect Digest and Cross Reference Hive HBase Traditional Media Traditional Media Traditional Media Traditional Media Traditional Media sparse edits Log Extract and Transform Sqoop bulk move older data RDBMS

  10. Rapid Content Management – Workload Characteristics Data Analysis Logic Search New Media New Media New Media New Media New Media Information Accumulation over time Information Accumulation over time Hibernate Driver • Structured and unstructured data • Transactional and analytic activities • Fast searches over “hot” data, slow searches over rest • RDBMS ops mixed with HBASE HBase Driver Hive Dialect Digest and Cross Reference Hive HBase Traditional Media Traditional Media Traditional Media Traditional Media Traditional Media sparse edits Log Extract and Transform Sqoop bulk move older data RDBMS • Fast searches over “hot” data, slow searches over rest • RDBMS ops mixed with HBASE • Structured and unstructured data • Transactional and analytic activities

  11. 3. Fraud Detection

  12. Telecom Payment Fraud Detection/Prevention -- Solution Flow  ALERT    Mid-transaction Analytics Recharge Transaction Credit Records Transactions History  SELECT  phone_number, SUM (charge_time), SUM (charge_amount) FROM trans_table WHERE SUM(charge_time) > threshold_1 and SUM(charge_amount) > threshold_2

  13. Summary • Workload scenarios from several “real life” use cases • Blend of SQL and NOSQL approaches • Recent data is available for queries nearly instantaneously • Real-time responsiveness combined with high data volumes • Mix of slow and fast operations • (low latency analytics on recent data, complex analytics on historical data)

More Related