1 / 22

Building real-time data processing and model inferencing platform with Kafka Streams

Building real-time data processing and model inferencing platform with Kafka Streams Navinder Pal Singh Brar. ML @ Walmart. Remaining 30-40% to make it production ready with help of developers. Data Science Model Life cycle. 50% + time spending in data collection and cleaning activity.

calvert
Download Presentation

Building real-time data processing and model inferencing platform with Kafka Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building real-time data processing and model inferencing platform with Kafka Streams • Navinder Pal Singh Brar

  2. ML @ Walmart

  3. Remaining 30-40% to make it production ready with help of developers Data Science Model Life cycle 50% + time spending in data collection and cleaning activity Courtesy: http://www.oogazone.com, https://www.vectorstock.com

  4. Mission Statement

  5. Customer Backbone - CBB

  6. CBB Platform CBB Data Pipeline Recommendation Partition: 1 Personalization Partition: 0 Kafka Streams Fraud Detection Kafka Streams …. CBB Internal Kafka

  7. WhyStreams? Other alternatives

  8. Multitenancy: the challenges 1 3 Sequential execution of tenant models Any model upgrade requires JVM restart 2 4 Any corrupt model can bring down the JVM Client Isolation

  9. CBB Platform CBB Data Pipeline Recommendation Personalization Kafka Streams Fraud Detection …. CBB Internal Kafka

  10. Model A A store After CBB Internals Before Model A Model B Model C CBB Processor Model B B store C store A store B store Model C C store CBB Store KIP-408: Add Asynchronous Processing To Kafka Streams

  11. Multitenancy: the solution

  12. Data Model

  13. Sequence Store CBB Processor writes here Sequence Store 0 1 2 3 4 5 6 7 8 9 10 11 … … … … Model A (offset=3) Model B (offset=8)

  14. Model Inferencing Problem Solution

  15. VM 1 Global Datastores Global Topic • VM 3 • VM 4 • VM 2 App Cluster

  16. Global Datastores Problem Solution

  17. Walmart Scale Source: https://corprate.walmart.com/our-story/our-business

  18. Identity Graph Processing

  19. Customer Identity Graph = id1 id2 id6 id1 id2 id5 id3 id3 id4 id5 id6 id4 Node A Node B Node A Graph processing co-locates the data of two or more customer identities linked to each other on the same physical node.

  20. Benchmarks

  21. Benefits Money Time Effort Minimal duplication Reduces maintenance overhead Low Latency Courtsey: https://www.vectorstock.com

  22. Thank You! navinderpalsinghbrar

More Related