R statistics programme and who are you
Download
1 / 51

r: statistics programme and who are you - PowerPoint PPT Presentation


  • 192 Views
  • Updated On :

R: Statistics? Programme? and Who are You?. -- An ABC introduction to R. Presented by Guohui Ding R&D, SIBS, CAS 8 Sept, 2004. For Fudan University. Main Topics Today . What is R? How to administrate R? How does R work? How to apply R for statistical problem?

Related searches for r: statistics programme and who are you

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'r: statistics programme and who are you' - Jimmy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
R statistics programme and who are you l.jpg
R: Statistics? Programme?and Who are You?

-- An ABC introduction to R

Presented by

Guohui Ding

R&D, SIBS, CAS

8 Sept, 2004

For Fudan University


Main topics today l.jpg
Main Topics Today

  • What is R?

  • How to administrate R?

  • How does R work?

  • How to apply R for statistical problem?

  • How to program your R function?

  • ………


What is r l.jpg
What is R?

A brief history of R


The legend of r l.jpg
The legend of R

  • R started in the early 1990’s as a project by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, intended to provide a statistical environment in their teaching lab. The lab had Macintosh computers, for which no suitable commercial environment was available.

Ross Ihaka

Robert Gentleman


R s parents 1 l.jpg
R’s Parents(1)

  • The S language

    • S: an interactive environment for data analysis developed at Bell Laboratories since 1976

    • Exclusively licensed by AT&T/Lucent to Insightful Corporation, Seattle WA. Product name: “S-plus”.

My father is S, mother is Scheme, but why my name is “R”?

You can learn more from:

http://cm.bell-labs.com/cm/ms/departments/sia/S/history.html


R s parents 2 l.jpg

-- Ihaka R. & Gentleman R., 1996

R’s Parents(2)

  • The Scheme language

    Scheme is a statically scoped and properly tail-recursive dialect of the Lisp programming language invented by Guy Lewis Steele Jr. and Gerald Jay Sussman.

    Learn more: http://swiss.csail.mit.edu/projects/scheme/

  • Scheme’s underlying semantics + S’syntax = R

  • “ We have named our language R –in part to acknowledge the

  • influence of S and in part to celebrate our own efforts.”

  • -- R. Ihaka

  • R. Gentleman


  • R now l.jpg
    R Now

    • Since mid-1997 there has been a core group who can modify the R source code CVS archive.

    • The R package system

      CRAN (the Comprehensive

      R Archive Network )

    http://www.r-project.org


    The characters of r l.jpg
    The characters of R

    • R is “GNU S” — A language and environment for data manipula-tion, calculation and graphical display.

      • That is R is a Free Software(or Open source software). (Here, Free refers to freedom, not price, although R is free in that sense as well.)

    • The core of R is an interpreted computer language.

      • A mosaic of procedure-based programming and object-oriented programming

      • Good interface to procedures written in C, C++, FORTRAN and other languages

      • A flexible data exchange mechanism accessing

        relational databases -ODBC,

        PostgreSQL, MySQL and so on.

    ——小偷与强盗的谈判


    R and statistics l.jpg
    R and Statistics

    • Most packages deal with statistics and data analysis.

    • Powerful statistical graphics.

    • Well crosstalking with other statistical softwares.

    • Most R user are statistical experts. You can learn more modern analysis method from they by email.

    • You can do it when you come across a thing no body do it before.


    Install and administrate r l.jpg
    Install and administrate R

    Focus on Windows(MS)


    How do i get r l.jpg
    How do I get R?

    • The informational web site http://www.r-project.org/

    • CRAN - the Comprehensive R Archive Network.

      • The primary site is http://cran.r-project.org/ .Mirror sites are available for many countries.

      • CRAN sites have binary distributions for Windows 95, 98, ME, NT4, 2000 and XP on Intel, for the Macintosh (System 8.6 to 9.1 and MacOS X), and for several Linux distributions.

    • New releases occur frequently

      • about every 3 months.

        Be prepared to re-install

        frequently.

    • Also you can get it

      from your friends,

      teachers, etc.

    Down it!

    It is about 20.6M in size.

    Using Precompiled

    Binary Distributions


    Installing r l.jpg
    Installing R

    • Double click “rw1091.exe” using your mouse. That is OK. You can install it as all other standard MS softwares.


    R console rgui in windows ms l.jpg
    R Console/RGui in Windows(MS)

    Graphics box

    Menu

    Icons

    Command box


    Several concepts in administrating r l.jpg

    -- Ihaka R. & Gentleman R., 1996

    Several concepts in Administrating R

    • Workspace

      • xxx.RData

    • History

      • xxx.Rhistory

    • Package

    • Object

    • Session

    • Console

    Run your R codes

    Load/save workspace

    Load/save History

    Change your working directory


    Add a new package l.jpg
    Add a new package

    • Commands:

      • library() add a package in the library

      • detach(package : xxx) detach a package

    • All can do in the GUI (except detach())

    Load a local package

    Install packages from

    internet or local

    Update the local package from internet


    Packages in r environment l.jpg
    Packages in R Environment

    • Basic packages

      • "package:methods" "package:stats" "package:graphics“ "package:utils" "package:base"

    • Recommanded packages

      • grid; lattice;e1071…

    • Contributed packages (more than 366 packages nowadays)

      • ……

    You can see what packages

    loaded now by the command search().


    Don t lose your way l.jpg
    Don’t lose your way!

    • Three useful system command

      • getwd()Get Working Directory

      • setwd() Set Working Directory

      • list.files()List the Files in a Directory/Folder


    Show the demonstrations of the packages functions l.jpg
    Show the Demonstrations of the Packages/Functions

    • Commands

      • demo() Demonstrations of R Functionality

      • example() Run an Examples Section from the Online Help


    Getting helps l.jpg
    Getting Helps

    • Several commands

      • help.start()

      • help() or ?()

      • help.search()

      • apropos()

    • Internet searching

      • I like it very

        much. It seems

        omnipotence.


    Quit r l.jpg
    Quit R

    • Command

      • q() Terminate an R Session


    How does r work l.jpg
    How does R work?

    Basic R Structure and data manipulation


    Basic r working flow object orientation l.jpg
    Basic R working flow(Object orientation)

    package

    -- R for Beginners. Emmanuel Paradis


    Object orientation l.jpg
    Object orientation

    • Object: a collection of atomic variables and/or other objects that belong together

    • Parlance:

      • class: the “abstract” definition of it

      • object: a concrete instance

      • method: other word for ‘function’

      • slot: a component of an object


    Types of data in r l.jpg
    Types of Data in R

    • The basic data object is a vector of elements of type:

      • numeric numbers - either floating point or integer

      • character each element is a character string

      • logical each element is TRUE or FALSE

      • list elements can be any type of object, including other lists

    • Components of the S language, such as functions, are also vectors.

    • Any vector can include the missing data marker NA as an element.

    • All vectors have a length and a mode. The functions length and mode return this information as does the str function.

    • A structure consists of a data object plus additional information. Matrices (or arrays, in general) and time series are examples of structures.



    Vectors matrices and arrays l.jpg
    Vectors, Matrices and Arrays

    • Command:

      • array(data = NA, dim = length(data), dimnames = NULL)

      • matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)


    Lists l.jpg
    Lists

    • List vs. Vector

      • list: an ordered collection of data of arbitrary types.

      • vector: an ordered collection of data of the same type.

      • Typically, vector elements are accessed by their index (an integer), list elements by their name (a character string). But both types support both access methods.


    Factors l.jpg
    Factors

    • Factors: classification variables

    • If the levels of a factor are numeric (e.g. the treatments are labelled“1”, “2”, and “3”) it is important to ensure that the data are ctually

      stored as a factor and not as numeric data. Always check this by using summary.


    Data frames l.jpg
    Data frames

    • data frame: is supposed to represent the typical data table that researchers come up with – like a spreadsheet.

      • It is a rectangular table with rows and columns; data within each column has the same type (e.g. number, text, logical), but different columns may have different types. ( A list actually)


    Subsetting l.jpg
    Subsetting

    Individual elements of a vector, matrix, array or data frame are accessed with “[ ]” by specifying their index, or their name


    Using r on windows ms l.jpg
    Using R on Windows(MS)

    Basic statistical analysis by R


    Data input l.jpg
    Data Input

    • From the keyboard one by one

      • c( ); scan( )

    • From the file

      • read.table(); read.csv();read.csv2();

        read.dta(); read.spss(); …

    • By a spreadsheet

      • data.entry()

      • edit()

      • fix()

      • ……


    Data edit l.jpg
    Data Edit

    • Commands

      • edit()

      • fix()

    Tips: edit() can invoke

    an notepad in the RGui!


    Data discription l.jpg
    Data Discription

    • Commands

      • summary()

      • mean()

      • sd()

      • hist()

      • boxplot()

      • ……



    Three useful prefix in probability distribution function l.jpg
    Three useful prefix in Probability Distribution Function

    • dxxx for the density

    • pxxx for the CDF

    • qxxx for the quantile function

    • rxxx for the simulation(random deviates)

    They are different!

    The seed is set by

    the system.

    You can set seed yourself

    by set.seed().


    Statistical inference l.jpg
    Statistical Inference

    • Commands

      • qxxx () for the quantile function

      • t.test()

      • wilcox.test(stats)

      • kruskal.test(stats)

      • var.test();

        shapiro.test();

        qqnorm();

        qqline()

        --……


    Analysis of variance and regression analysis l.jpg
    Analysis of variance and Regression Analysis

    • Commands

      • anova()

      • lm()

      • ……


    Experiment design l.jpg
    Experiment Design

    • Commands

      • sample()

      • power.t.test()

      • ……


    Save object data l.jpg
    Save Object/Data

    • Every R object can be stored into and restored from a file with the commands “save” and “load”.

      > save(x, file=“x.Rdata”)

      > load(“x.Rdata”)

    • Importing and exporting datawith rectangular tables in the form of tab-delimited text files.

      > write.table(x, file=“x.txt”, sep=“\t”)



    A friendly r environment rcmdr l.jpg
    A Friendly R Environment -- Rcmdr

    If you don’t like a command line environment, package Rcmdr may be a good choice!


    R programming r l.jpg
    R programming (.R)

    Program your R code own


    Control flow l.jpg
    Control Flow

    • if(cond) expr

    • if(cond) cons.expr else alt.expr

    • for(var in seq) expr

    • while(cond) expr

    • repeat expr

    • break

    • next


    Loops l.jpg
    Loops

    • The main loop construct in R is for. The commonest use, as in C and other languages, is to count from 1 to n.

      • for (i in 1:n) {

        ## do something

        }


    Leaving loops l.jpg
    Leaving loops

    • The breakand nextcommands allow the flow of a loop to be altered

      • break jumps out the loop

      • next jumps to the next iteration of the loop


    Avoiding iteration l.jpg
    Avoiding Iteration

    • The canonical bad R program looks like this

      • ## multiply two vectors

      • for(i in 1:n) {

        d[i] <- a[i] * b[i]

      • }

      • ##compute the inner product

      • s <- 0

      • for (i in 1:n){

      • s <- s + d[i]

      • }

  • The right way to do this is

    • s<-sum(a*b)

  • apply(); lapply(); sapply()


  • Write r function l.jpg
    Write R function

    A function definition looks like

    median <- function(x, na.rm = FALSE)

    {

    …lots of code...

    ## a return value

    }


    Slide49 l.jpg
    More

    • Packages

    • Objects and methods

    • Debugging and optimisation

    • Connecting to other packages

    • Interface to other programme language or DataBase

    R++? ++R!


    Some resources l.jpg
    Some Resources

    • A Course (The ppt is showed with R Development Core Group)

      • http://faculty.washington.edu/tlumley/Rcourse/

    • A Paper (citing R in a publication)

      • Ihaka R. & Gentleman R. 1996. R: a language for data analysis and graphics. Journal of Computational and Graphical Statistics 5: 299–314.

    • Two URL

      • http://www.r-project.org

      • http://www.ats.ucla.edu/stat/

    • Several Books

      • Using R for Data Analysis and Graphics—An Introduction. J.H. Maindonald

      • An Introduction to R. The R Development Core Team

      • simpleR –Using R for Introductory Statistics. John Verzani

      • R for Beginners. Emmanuel Paradis

      • The R Reference Manual Base Package. The R Development Core Team


    Acknowledge l.jpg
    Acknowledge

    PhD. Qi Liu Prof. Naiqing Zhao

    Prof. Gang Pei Everyone Here

    Prof. Yixue Li

    Any Question?


    ad