1 / 7

Accelerate AI with a Data Pipeline Strategy

Accelerate AI with a Data Pipeline Strategy. Brian Schwarz, VP Product Management and Development. Pure Storage is a Consumer and Producer. Share our Learning and Contribute to the Community. Our Data Pipeline. We are passionate about Deep Learning. IoT use case. 1000s devices.

dacton
Download Presentation

Accelerate AI with a Data Pipeline Strategy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accelerate AI with a Data Pipeline Strategy Brian Schwarz, VP Product Management and Development

  2. Pure Storage is a Consumer and Producer Share our Learning and Contribute to the Community Our Data Pipeline We are passionate about Deep Learning • IoT use case • 1000s devices • Proactive Support + FLASHBLADE Scale-Out Software+ Flash + Networking

  3. Data is the New Oil "We don't have better algorithms, we just have more data.“ Deep Learning = Innovation New SW Model  New Compute model How to blend this with tradition Analytic and AI? Why Data? What Changes? Deep Learning Older Learning Algorithms Accuracy Amount of Data Deep learning chart courtesy of Andrew Ng

  4. Many Choices Between You and Success Deep learning best choice when dozens, hundreds, thousands of variables at play Traditional app vendor extensions limited in scope Data + Infrastructure is the platform Pre-processed and structured data play a role Data has gravity  Operate where the data resides Data Scientists, Data Explorers, and Data Curators Let Your Data Guide You Software Infrastructure Training Deployment Options PUBLIC CLOUD YOUR CLOUD

  5. Data Pipeline for Deep Learning Training Collect/Extract data (images, video, audio, sensors, etc…)  Files and Objects  Keep copy of raw data Collect Extract Transform Tag Debug Training Optimize Data Management Tagging + Testing / Debug of model bigger challenges than many anticipate Training – more data wins  optimize steps 1-5 Optimize Infrastructure Collect, Extract, Transform, and Tag run on traditional x86 infrastructure Model Debug and Training run on large GPU clusters Avoid slow shared storage, otherwise copying data between local storage at each stage will slow progress

  6. Takeaways Data is the new oil – Design a data factory Hire the best team + augment with trusted advisors from consultants and vendors 1 2 3 The only constant is change (especially in the SW tool-chain), build a solid team and infrastructure to accommodate it

  7. THANK YOU

More Related