1 / 19

Revolution Analytics Certification Talk DublinR User Group, 12 th May 2015

Revolution Analytics Certification Talk DublinR User Group, 12 th May 2015. Revolution Analytics R Platform. Revolution Analytics R Platform. RevoR: - performance enhanced R interpreter, multi-core processing ConnectR: - high-speed connectors with third party systems (SAS, Teradata, Hadoop)

wilkinsb
Download Presentation

Revolution Analytics Certification Talk DublinR User Group, 12 th May 2015

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Revolution Analytics Certification TalkDublinR User Group, 12th May 2015

  2. Revolution Analytics R Platform

  3. Revolution Analytics R Platform RevoR: - performance enhanced R interpreter, multi-core processing ConnectR: - high-speed connectors with third party systems (SAS, Teradata, Hadoop) DistributedR: - distributed computing framework DevelopR: - visual step-in debugger and IDE for R DeployR: - WS (json + xml) , SDKs for Java, JS, .NET ScaleR: - Data preparation, descriptive statistics, correlation and covariance matrices, predictive modelling

  4. Certification trivia - title : “Revolution R Enterprise Certified Specialist” - $200 examination fee - exam booked online at http://www.kryteriononline.com - three possible test center locations in Dublin (New Horizons, SureSkills, The Exam Centre/Leopardstown) - 90 minutes - 60 questions - 70% passing score

  5. http://datacamp.com

  6. http://datacamp.com

  7. Amazon AWS

  8. Help on RevoScaleR functions: http://www.rdocumentation.org/packages/RevoScaleR/ • rxAddInheritance RxAvroData RxAvroData-class RxAzureBurst RxAzureBurst-class rxBTrees rxCancelJob rxChiSquaredTest rxCleanup rxCompareContexts rxCompressXdf RxComputeContext RxComputeContext-class rxCovCor rxCovRegression rxCrossTabsrxCube rxDataFrameToXdf RxDataSource RxDataSource-class rxDataStep rxDForest rxDForestUtils RxDistributedHpa-class rxDistributeJob rxDTree rxDTreeBestCp rxElemArg rxExec rxExecuteSQLDDL rxExpression rxFactors RxFileData-class RxFileSystem rxFindFileInPath RxForeachDoPar RxForeachDoPar-class rxFormula rxGetAvailableNodes rxGetEnableThreadPool rxGetInfoXdf rxGetJobInfo rxGetJobOutput rxGetJobResults rxGetJobs rxGetNodeInfo rxGetNodes rxGetVarInfoXdf rxGetVarNames rxGLM rxHadoopCommand RxHadoopMR RxHadoopMR-class rxHdfsConnect RxHdfsFileSystem rxHistogram RxHPCServer RxHPCServer-class rxImport rxImportToXdf RxInTeradata RxInTeradata-class rxKmeans rxLaunchClusterTaskManager rxLinePlotrxLinMod RxLocalParallel RxLocalParallel-class RxLocalSeq RxLocalSeq-class rxLocateFile rxLogit rxLorenz RxLsfCluster RxLsfCluster-class rxMakeRNodeNames rxMarginals rxMergeXdf rxMultiTest RxNativeFileSystem rxNew RxOdbcData RxOdbcData-class rxOpen-methods rxOptions rxPairwiseCrosstab rxPingNodes rxPredict rxPredict.rxDForest rxPredict.rxDTree rxQuantile rxReadXdf rxRemoteCall rxRemoteGetId rxRemoteHadoopMRCall rxResultsDF rxRiskRatio rxRng rxRoc RxSasData RxSasData-class rxSetComputeContext rxSetFileSystem rxSetInfo rxSetVarInfoXdf rxSortXdf rxSplitXdf RxSpssData RxSpssData-class rxStepControl rxSummary RxTeradata RxTeradata-class rxTeradataSql RxTextData RxTextData-class rxTextToXdf rxTransform rxTweedie rxWaitForJob RxXdfData RxXdfData-class rxXdfFileName rxXdfToDataFrame rxXdfToText

  9. What you need to know for the exam Workspace management • search() • ls() • rm() • save(), load() • write.table(), read.table()

  10. What you need to know for the exam operations on data structures • X=-2:7; x[-4:-5] ( negative indexing ) • X=-2:7; sum(x[x<1]) ( boolean indexing ) • Replicating values, filling up data structures Array(3:5,1:3)[1,,2] • use of tapply/sapply/apply

  11. What you need to know for the exam RevoScaleR XDF File Format: External Data Frame: • binary format • loads directly to memory • Data chunks • New rows and columns can be added to the file without re-writing the entire file

  12. What you need to know for the exam • Importing data and export data -what will this one return: rxImport(inData, outFile = "abc.xdf",...) - what's returned? rxOdbcData(query, table, connectionString) rxTextToXdf()

  13. What you need to know for the exam • Summary statistics rxGetInfo() rxHistogram() rxSummary() - three of four questions on rxSummary in combination with rxFormula.

  14. What you need to know for the exam Using formulas for descriptive statistics rxSummary( formula = ~ F(age) : sex, data = censusWorkers) You may need to know how to build formulas: ~ to separate response from predictor vars + to separate predictor variables : to denote interactions between predictor vars F(x) to treat numeric var x as a categorical var N(x) – opposite to F(x) * adds all subsets of interactions to the model

  15. What you need to know for the exam Data transformations rxDataStep( inData, returnTransformObjects , transformObjects = list(a,b,c), transformFunc = someCustomFunction, transformVars = c("x1", "x2") ) - remember you're processing a possibly large data set - there are special requirements on how to create custom functions

  16. What you need to know for the exam Machine learning • RxKmeans - k-means clust. • RxDTree - decision trees • RxLinMod - linear models • RxLogit - logistic regr. • RxGLM - generalized LinMod

  17. What you need to know for the exam Model fitting what about fitted values and residuals? What would you look for when observing residuals (zero mean, heteroskedacity, normal distribution, etc)? Predictive Modelling: For each type of models know its essential parameters (maxDepth or cp for Descision trees, numClusters for K-means, family for GLM). Example question: rxGlm formula is defined as rxGLM( y~x, family="binomial (link=logit)"). What can be assumed regarding discreet/continuous nature of variables and their relationship? linear ? log(y)~x ? x is categorical? y is binary?

  18. What you need to know for the exam Model fitting what about fitted values and residuals? What would you look for when observing residuals (zero mean, heteroskedacity, normal distribution, etc)? Predictive Modelling: For each type of models know its essential parameters (maxDepth or cp for Descision trees, numClusters for K-means, family for GLM). Example question: rxGlm formula is defined as rxGLM( y~x, family="binomial (link=logit)"). What can be assumed regarding discreet/continuous nature of variables and their relationship? linear ? log(y)~x ? x is categorical? y is binary?

  19. What you need to know for the exam Miscellaneous questions: • which functions you may use together with rxCrossTabs for testing independence of variables (rxFisherTest, rxKendallCor, rxChiSquaredTest) • What does rxCor function return? (Pearson's correlation matrix) • What graphics subsystem does rxPlotLine use underneath? (lattice? ggplot2? googleVis? base graphics?) • Two questions on Principal Component Analysis ( splits variables into ? independent? dependent? asymptotic? normally distributed? ) • Two other questions on covariance/correlation ( cov(xy)=cor(xy)*sd(x)*sd(y)) • Which operations are not supported for in-the-fly response variable transformations with rxSummary : F(y), N(y), rowSelection, transform=(<list()>) • Which functions to use for obtaining contingency tables (rxSummary? rxCube? rxCrossTabs)

More Related