1 / 21

Leveraging scriptable infrastructures, Towards a paradigm shift in software for data science

Karim Chine. Leveraging scriptable infrastructures, Towards a paradigm shift in software for data science. Cloud Era Ltd 14 June 2013 karim.chine@cloudera.co.uk. Outline. Introduction Fragmentation and friction in the Data Science arena

teneil
Download Presentation

Leveraging scriptable infrastructures, Towards a paradigm shift in software for data science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Karim Chine Leveraging scriptable infrastructures, Towards a paradigm shift in software for data science Cloud Era Ltd 14 June 2013 karim.chine@cloudera.co.uk

  2. Outline • Introduction • Fragmentation and friction in the Data Science arena • The understated potential of Cloud scriptability • Elastic-R: Towards a universal platform for Data Science • Elastic-R: Design and Technologies Overview • Elastic-R: The scriptability toolset • Demo • Conclusion

  3. Introduction “The next information revolution is well underway. But it is not happening where information scientists, information executives, and the information industry in general are looking for it. It is not a revolution in technology, machinery, techniques, software, or speed. It is a revolution in concepts.” Peter Drucker. Forbes ASAP, August 24, 1998

  4. Introduction • Science and the 4th Paradigm • The e-Research Dancing Bears Experimental Science Theoretical Science Computational Science e-Science / Data-intensive Science The townspeople gather to see the wondrous sight as the massive, lumbering beast shambles and shuffles from paw to paw. The bear is really a terrible dancer, and the wonder isn't that the bear dances well but that the bear dances at all.

  5. Fragmentation and friction in the Data Science Arena www.scipy.org www.python.org www.scilab.org • Open-source (GPL) software environment for statistical computing and graphics • Lingua franca of data analysis. • Repositories of contributed R packages related to a variety of problem domains in life sciences, social sciences, finance, econometrics, chemo metrics, etc. are growing at an exponential rate. • R is Super Glue www.sagemath.org www.wolfram.com www.mathworks.com office.microsoft.com www.spss.com www.sas.com http://root.cern.ch

  6. The understated potential of Cloud scriptability • 3D printers are becoming a common place: • Creating three-dimensional solid object of virtually any shape from a digital model. • Scripting the physical world • Sharing physical reality on Facebook is now as easy as sharing your holiday pictures

  7. Elastic-R: Towards a universal platform for Data Science Computational Components R packages, Wrapped C,C++,Fortran code, Python modules, MatlabToolkits… Open source or commercial ComputationalResources Clusters, grids, private or public clouds Free or pay-per-use ComputationalGUIs HTML5 and Desktop Workbench Built-in views/Plugins /Collaborative views Open source or commercial Computational Storage Local, NFS, FTP, Amazon S3, EBS Computational Scripts R / Python / Matlab / Groovy ComputationalAPIs Java / SOAP / REST, Stateless and stateful GeneratedComputational Web Services Stateful or stateless, mappingof R objects/functions

  8. Elastic-R: Towards a universal platform for Data Science

  9. Elastic-R: Towards a universal platform for Data Science Public Clouds Private Cloud

  10. Elastic-R: Design and Technologies Overview Your data ishere You are here

  11. Elastic-R: Design and Technologies Overview • Remote Java/R Processes • Events-driven Remote Objects/Engines • R, Python, Mathematica, Matlab, Scilab, ... • Collaborative Spreadsheets • Collaborative Scientific Graphics Canvas • Collaborative Dashboard with collaborative widgets

  12. Elastic-R: Design and Technologies Overview Elastic-R AMI 1 R 2.10 BioC 2.5 Elastic-R AMI 2 R 2.9 BioC 2.3 Elastic-R AMI 2 R 2.9 BioC 2.3 Elastic-R AMI 3 R 2.8 BioC 2.0 Elastic-R EBS 4 Data Set VVV Elastic-R Amazon Machine Images Elastic-R EBS 1 Data Set XXX Elastic-R EBS 4 Data Set VVV Elastic-R.org Eastic-R AMI 2 R 2.9 BioC 2.3 Elastic-R EBS 3 Data Set ZZZ Elastic-R EBS 2 Data Set YYY Elastic-R EBS 4 Data Set VVV Amazon Elastic Block Stores

  13. Modes of access • Individuals with AWS accounts • Standard AMIs (Amazon Machine Images): paid-per-use to Amazon. For Academic use and trial purposes • Paid AMI: Paid-per-use software model. For business users • Individuals without AWS accounts • Trial tokens, purchased tokens, tokens granted by other users. • Resources (data science engines) shared by other users • Individual subscribtions • Companies/Educational & Research Institutions • Dedicated Platform and AMIs on an Amazon VPC (Virtual Private Cloud). Paid via subscribtion

  14. Elastic-R: The scriptability toolset Command Line API Web Console SDK

  15. Elastic-R: The scriptability toolset Command Line API Web Console SDK

  16. Elastic-R: The scriptability toolset rws.war Script / globals.r square function(x) {return(x^2) } typeInfo(square)SimultaneousTypeSpecification( TypedSignature(x = "numeric"), returnType = "numeric")‏ + mapping.jar + pooling framework + R Java Bridge + JAX-WS - Servlets - Generated artifacts WS generator Script / rjmap.xml Deploy <rj> <publish> <functions> <function name="square" forWeb="true"/> </functions> </publish> <scripts> <initScript name="globals.r" embed="true"/> </scripts> </rj> public static void main(String[] args) throws Exception { RGlobalEnvFunctionWeb g=new RGlobalEnvFunctionWebServiceLocator().getrGlobalEnvFunctionWebPort(); RNumeric x=new RNumeric(); x.setValue(new Double[]{6.0}); System.out.println(g.square(x).getValue()[0]); } Eclipse Web Service Client Generator

  17. Elastic-R: The scriptability toolset • API: Soap and Restful Web Services • Data Analysis Engines control • Data AnalysisEngines Management (Life cycle, etc.) • Virtual appliances/artifacts management • Platform administartion • SDKs: Java, R, Microsoft Office (Vba) • Command line: elasticR package • Html5 Workbench • Elastic-R artifacts management interface • Engines control interface

  18. Demo • Register to Elastic-R academic and trial portal (www.elastic-r.org) • Create Data Science Engines using trial tokens • Work with R, Python and Scientific Spreadsheets in the browser • Share Data Science Engine and Collaborate • Connect to the remote Data Science Engine from within a local R session, push and pull data, execute commands and show impact on spreadsheets and dashboards

  19. Conclusion • Elastic -R unlocks the potential of the cloud for Data Scientists and analytical applications developers and improves dramatically their productivity • The Data Science IT Factory chain, from resources acquisition to services and applications design and publication becomes: • Fully scriptable in R • Underthe direct control of the Data Scientist and the developer • Fully traceable and reproducible • Elastic-R can seamlessly extend any existing application or service and can be used as a collaboration backend • Openness, security and reliability are among the key capabilities delivered

  20. What to do Next • Register to Elastic-R and try the HTML 5 Workbench and the collaboration capabilities • Download the R package elasticR and use it to access the cloud from local R sessions • Download the Java SDK and create your first analytical application using AWS and advanced tools for programming with data. • Get in touch with me to explore potential partnerships and collaborations

  21. Contact details • Karim Chine • karim.chine@cloudera.co.uk

More Related