About me
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

About me PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on
  • Presentation posted in: General

About me. Educational background – Applied Econometrics 4 years statistical modelling experience R experience – 2 years Currently Senior Analyst at Deloitte Hobby – rock climbing , data mining competitions Why? - Early retirement Current interest – Text analytics .

Download Presentation

About me

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


About me

About me

  • Educational background – Applied Econometrics

  • 4 years statistical modelling experience

  • R experience – 2 years

  • Currently Senior Analyst at Deloitte

  • Hobby – rock climbing, data mining competitions

  • Why? - Early retirement

  • Current interest – Text analytics


About me

Topic: The benefits of R from a data mining competitor’s point of view and from the point of view of an employee at Deloitte

  • Work

  • Professional and pragmatic

Home

The playful scientist


Agenda

Agenda

  • Quick introduction to R

  • What I use R for

  • R at work

    • Introduction to Deloitte

    • Frequently used tools

    • Some of the work we do using R

    • Examples

    • Challenges: Data Storage

    • Challenges: Standardisation

    • How Deloitte is addressing this issue

  • R at home:

    • Some of the work I do using R, at home

    • Flexibility and convenience

    • Examples

    • Prototyping and experimenting

    • Examples

  • Questions

  • Essential R packages for everyday use


  • Quick introduction to r

    Quick introduction to R

    • “A statistical software created by statisticians, for statisticians”

    • Personally, I use R for data analysis and statistical modelling

    • Unique features worth noting:

    • Open source – free, easy to find help in the active community

    • Understands mathematical computations and matrix operations naturally

    • Thousands of packages, implementations of almost any algorithm


    Introduction to r thousands of packages implementations of almost any algorithm

    Introduction to RThousands of packages, implementations of almost any algorithm

    Packages

    ggplot2

    EBImage

    randomForest

    etc

    N = 500+


    About me

    • R at work


    Introduction to deloitte

    Introduction to Deloitte

    • We help clients capture, manage and analyse data to help solve important business problemsto make informed decisions

    • A holistic process of data mining


    Introduction to deloitte typical activity involved in a project at deloitte

    Introduction to Deloitte: Typical activity involved in a project at Deloitte

    But not everything is R

    Data preparation

    Level of Activity

    Modeling

    Planning

    processes

    Data loading

    Closing

    processes

    20% - 40% time spent on modelling

    Initiating

    processes

    Time line


    Frequently used tools

    Frequently used tools

    • Geospatial analytics - Tactician

    • Segmentation - Self Organising maps

    • SQL server

    • Modelling

    • Visualisation


    Some of the work we do using r

    Some of the work we do using R

    • In Deloitte

    • Statistical Analysis and Predictive modelling

    • Time series analysis

    • Social Network Analysis

    • Data visualisation

    • Text analytics (NEW!)


    Examples time series

    Examples: Time Series

    Actual

    --- Estimate

    y – retail activity?

    Fitted

    Time (days)

    R package:

    forecast


    Challenges data storage

    Challenges: Data Storage

    • We have a dedicated tool to store and clean data – SQL

    • R cannot handle large data sets

    Error: cannot allocate vector of size 2097151 Kb


    Challenges standardisation

    Challenges: Standardisation

    • ‘You’re not the only one using it”

      One of the reason’s why other commercial tools are preferred over R

    • Transferable skills across the team

    • Reliability of packages

    • Standardised functions and procedures


    How deloitte is addressing this issue

    How Deloitte is addressing this issue

    • Creating standardised process:

    R package:

    RODBC


    How deloitte is addressing this issue1

    How Deloitte is addressing this issue

    • Creating standardised functions:

    • # Density Plot for subject variable

    • DensityPlot <- function(dataset, col) {

    • ds <- data.frame(dataset);ds$c<- ds[,c(col)];a <- ggplot(data=ds, aes(x=c) )

    • a <- a + geom_density(kernel="biweight");a

    • }

    • DensityPlot (dataset, column number)

    • Retrieving data from the database (RODBC):

    • conn <- odbcDriverConnect("driver=SQL Server; database=DataBaseName; server=servername;")

    • query <- “Select * from TableName”

    • df<- sqlQuery(conn,query)

    R package:

    RODBC


    About me

    • R at home


    Some of the work i do using r at home

    Some of the work I do using R, at home

    At home (data mining competitions)

    • Statistical analysis and Predictive modelling

    • Time series analysis

    • Social Network Analysis

    • Data visualisation

    • Text analytics

    • Image analysis

    • (I mainly use R)

    • In Deloitte

    • Statistical Analysis and Predictive modelling

    • Time series analysis

    • Social Network Analysis

    • Data visualisation

    • Text analytics (NEW!)

    • (we don’t just use R)


    Flexibility and convenience

    Flexibility and convenience

    • Is one of the easier programming languages to pick up

    • Dive into the analysis quickly


    Examples

    Examples

    • Image analysis

    R package:

    EBImage


    Examples1

    Examples

    • Image Analysis

    R package:

    EBImage


    Prototyping and experimenting

    Prototyping and experimenting

    • Access to the latest most innovative techniques

    • Great for prototyping new algorithms


    Examples text analytics

    Examples:Text analytics

    R package:

    twitteR

    +


    Examples word cloud of twitter feeds

    Examples: Word cloud of twitter feeds

    R package:

    wordcloud


    Examples text analytics1

    Examples:Text analytics

    What are the common themes that are being tweeted by Time magazine?

    +

    =

    ?


    About me

    A

    Top words associated to the classification

    Tweet

    B

    C

    D

    A

    B

    C

    D

    R package:

    ggplot2


    Classification results

    Classification results


    Questions

    Questions?


    Essential r packages for everyday use

    Essential R packages for everyday use

    • Essential

    • ggplot2

    • reshape

    • RODBC

    • randomForest

    • rpart

    • Nice to have

    • caret

    • forecast

    • tm


  • Login