- 335 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Bare-Bones R' - Pat_Xavi

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Citation and Usage

This set of PowerPoint slides is keyed to Bare-Bones R: A Brief Introductory Guide, by Thomas P. Hogan, SAGE Publications, 2010.

All are welcome to use and/or adapt the slides without seeking further permission but with the usual professional acknowledgment of source.

Part 1: Base R

- 1-1 What is R
- A computer language, with orientation toward statistical applications
- Relatively new
- Growing rapidly in use

1-2 R’s Ups and Downs

- Plusses
- Completely free, just download from Internet
- Many add-on packages for specialized uses
- Open source
- Minuses
- Obscure terms, intimidating manuals, odd symbols, inelegant output (except graphics)

1-3 Getting Started: Loading R

- Have Internet connection
- Go to http://cran.r-project/
- R for Windows screen, click “base”
- Find, click on download R
- Click Run, OK, or Next for all screens
- End up with R icon on desktop

Downloading Base R [Figs 1.1 – 1.4]

- Click on Windows
- Then in next screen, click on “base”
- Then screens for Run, OK, or Next
- And finally “Finish”
- will put R icon on desktop

What You Should Have when clicking on R icon:Rgui and R Consoleending with R prompt (>) [Fig 1.5]

The R prompt (>)

- > This is the “R prompt.” It says R is ready to take your command.

1-4 Using R as Calculator

- Enter these after the prompt,

observe output

>2+3

>2^3+(5)

>6/2+(8+5)

>2 ^ 3 + (5)

More as Calculator

- You can copy and paste, but don’t include the >
- Use # at end of command for notes, e.g.

> (22+ 34+ 18+ 29+ 36)/5 #Calculating the average, aka mean

- R as calculator: Not very useful

1-5 Creating a Data Set

- > Scores = c(22, 34, 18, 29, 36)

c means “concatenate” in R

– in plain English “treat as data set”

- Now do:

>Scores

R will print the data set

Important Rules

- We created a variable
- Variable names are case sensitive
- No blanks in name

(can use _ or . to join words, but not -)

- Start with a letter (cap or lc)
- Can use <- instead of =

Another variable

- Create SCORES, using <-

> SCORES<-c(122, 134, 118, 129, 124)

- NB: SCORES different than Scores

Check with

>SCORES

>Scores

Non-numeric Data

- Enclose in quotes, single or double
- Separate entries with comma
- Example:

> names = c(“Mary”, “Tom”, “Ed”, “Dan”, “Meg”)

Saving Stuff

- To exit: either X or quit ( )
- Brings up this screen:
- Do what you want: Yes or No
- Do Yes,
- then re-open R, get Scores & names

Special Note on Saving

- Previous slide assumes you control computer
- If not, use File, Save Workspace, name file, click Save
- Works much like saving a file in Microsoft
- To retrieve, do File, Load Workspace, find file, click Open

1-6 Using R Functions: Simple Stuff

- Commands for mean, sd, summary

(NB: function names case sensitive)

- mean(Scores)
- sd(Scores)
- summary(Scores)
- Command for correlation
- cor(Scores,SCORES)

R functions

- A zillion of ‘em
- R’s big strength, most common use
- For examples:
- Help
- R functions(text)
- Enter name of a function (e.g., sd)
- Yields lots (!) of information

1-7 Reading in Larger Data Sets

- In Excel, enter (or download) the SATGPA20 file
- Save as .xls
- Then save as Text (tab delimited) file
- Will have .txt extension

… Larger Data SetsThe read.table command

- Now read into R like this:

>SATGPA20R=read.table("E:/R/SATGPA20.txt", header =T)

- Need exact path, in quotes
- header = T
- T or TRUE, F or FALSE
- Depends on opening line of file

The file.choose ( ) command

- At > enter file.choose ( )
- Accesses your system’s files, much like Open in Microsoft
- Find the file, click on it
- R prints the exact path in R Console
- Can copy and paste into read.table

The data.frame Command

- Let’s create these 3 variables with c

> IQ = c(110, 95, 140, 89, 102)

> CS = c(59, 40, 62, 40, 55)

> WQ = c(2, 4, 5, 1, 3)

- Then put them together with:

>All_Data = data.frame(IQ, CS, WQ)

- Check with:

>mean(All_Data)

1-8 Getting Help

- >help(sd)
- >example(sd)
- On R Console:

Help

R functions (text)

Enter function name, click OK

Reminder: function names case sensitive

R’s “function” terms

R language: function(arguments)

Plain English: Do this (to this)

or Do this (to this, with these conditions)

1-9 Dealing with Missing Data

- NB: It’s a pain in R!
- Key items
- In data, enter NA for a missing value
- In (most) commands, use na.rm=T

Examples for missing data

>Data=c(2,4,6,NA,10)

>mean(Data, na.rm=T)

- Add to the SATGPA20 file

21 1 NA NA NA 3.14

23 2 1 NA NA 2.86

Etc. and create new file SATGPA25R

- Then

>mean(SATGPA25R, na.rm=T)

- Note exception for cor function (use=‘complete’)

1-10 Using R Functions: Hypothesis tests

- Be sure you have an active data set (SATGPA25R), using attach if needed
- Then, to test male vs. female on SATM:

>t.test(SATM~SEX) # note tilde~

- Examples of changing defaults:

>t.test(SATM~SEX, var.equal=TRUE, conf.level=0.99)

Hypothesis tests: Chi-square

- Using SEX and State variables in SATGPA25R
- chisq.test (SEX, State)

1-11 R Functions for Commonly Used Statistics

functioncalculates this

mean ( ) mean

median ( ) median

mode ( ) mode

sd ( ) standard deviation

range ( ) range

IQR ( ) interquartile range

min ( ) minimum value

max ( ) maximum value

cor ( ) correlation

quantile ( ) percentile

t.test ( ) t-test

chisq.test ( ) chi-sqaure

NB1: See notes in text for details

NB2: R contains many more functions

1-12 Two Commands for Managing Your Files

> ls ( )

Will list your currently saved files

> rm (file)

Insert file name; this will remove the file

NB: R has many such commands

1-13 R Graphics

- R graphs: good, simple
- Let’s start with hist and boxplot with the SATGPA25R file

>hist(SATM)

>boxplot(SATM)

>boxplot(SATV, SATM)

- R Graphics window opens, need to minimize to get R Console

More Graphics: plot

- Create these variables

>RS=c(12,14,16,18,25)

>MS=c(10,8,16,12,20)

- Now do this:

>plot(RS, MS)

Line of Best Fit

- Do these for the RS and MS variables:

> lm(MS~RS) # lm means linear model

> res=lm(MS~RS) # res means residuals

> abline(res) # read as ‘a-b’ line

Controlling Your Graphics: A Brief Look

- R has many (often obscure) ways for controlling graphics; we’ll look at a few
- Basically, we’ll change “defaults”

Examples (try each one):

- Limits (ranges) for X and Y axes

>plot(RS, MS, xlim = c(5,25), ylim = c(5,25))

Controlling Graphs: More Examples

- Plot characters:

>plot(RS, MS, pch=3)

- Line widths

>plot(RS, MS, pch=3, lwd=5)

- Axis labels

>plot(RS, MS, xlab = “Reading Score”, ylab = “Math Score”)

- You can put them all together in one command

Part 2: R Commander

- 2-1 What is R Commander?
- Point and click version of R
- Uses (and prints) base R commands
- Loading: Easy – it’s just a package
- See next slide

Loading Rcmdr

- On R Gui, top menu bar

click Packages,

then Install package(s).

Pick a CRAN mirror site (nearby), click OK.

From the list of packages ,scroll to Rcmdr,

highlight it, click OK

- After it loads, do these:
- Check with: >library ( )
- Activate with: >library (Rcmdr)

Rcmdr’s extra packages

- Scary message when first activating Rcmdr:
- Just click Yes – and take a break

The R Commander Window

- You get, R Commander window with
- Script window
- Output window

(incl Submit button)

- Message window

2-2 R Commander Windows and Menus

- File
- Edit
- Data **
- Statistics ** Most important for us
- Graphs **
- Models
- Distributions
- Tools
- Help

Our Lesser Used Menus

- File [Table 2.1]
- Much like in Microsoft
- Manage files
- Edit [Table 2.2]
- Much like in Microsoft
- Can do with right click of mouse

Our Lesser Used Menus (cont)

- Models

Mostly more advanced stats

- Distributions
- Tools
- Load packages
- Options – change output defaults
- Help
- Searchable index
- R Commander manual

2-3 The Data Menu (very important)(Submenus for creating/getting data sets)

- New data set – create new data set
- Load data set – only for existing .rda data
- Import data – import from various file types
- Data in packages – not important for us

Data Menu (cont.) (Submenus for managing data sets)

- Active data set
- Do stuff with current data set
- Manage variables in active data set
- Do stuff with variables in current data set

New data set [Fig. 2.3]

- Click on it, brings up spreadsheet
- Name it SampleData

New data set (cont)

- Enter these data:

var1 var2 var3

2 1 5

5 4 7

3 7 8

6 8 9

9 2 9

- Then kill window with X
- Note: SampleData in Active Data Set

Now Try These

- View active data set
- Edit active data set
- In Script window, type*
- mean(SampleData)
- sd (SampleData)
- mean(var1) [gives error message]
- Attach(SampleData)
- mean(var1)

* When typing do not include >, do hit Submit

Changing “var” names

- Data

Manage variables in active data set

Rename variables

Change names to Rater1, Rater2, Rater3

- Then check with

mean(SampleData)

mean(Rater1)

Compute new variable

- Data

Manage variables in active data set

Compute new variable

- Give name to new variable, call it Total
- In ‘Expression to compute’, enter Rater1+Rater2+Rater3
- Check with
- View data set
- mean (SampleData)

Import data(very important submenu)

- Allows importing from
- .txt file
- SPSS file
- Excel file
- Several others
- Try it with a .txt file
- (must already exist; try with SATGPA25.txt)

Convert Numeric Variables to Factors

- Recall types of scales (esp. nominal)
- Rcmdr assumes numeric
- To convert to nominal (factor)
- Data, then Manage variables in active data set, and Convert numeric variables to factors. Highlight the variable you want to convert, click OK. In the next window, give labels to the levels of the variable.
- Try with SEX and State in SATGPA25R

2-4 The Statistics Menu

- Obviously very important
- Most pretty clear how to do
- Some go beyond intro stats
- Some surprises on what’s where
- We’ll just sample some of them
- Put SATGPA25R in Active data set

Statistics: Summaries (Try each of these with SATGPA25R in Data set, observe output)

- Active data set (see next slide)
- Numerical summaries (see next slide)
- Frequency distributions
- Summaries
- Count missing observations
- Table of statistics
- Correlation matrix
- Correlation test
- Shapiro-Wilk test of normality

Getting started on Stat menu

- Statistics – Summaries - Active data set
- Statistics – Summaries – Numerical summaries
- Etc. with others

Statistics: Means(Try t-test, ANOVA)

- Single-sample t-test
- Independent samples t-test (TRY*)
- Paired t-test
- One-way ANOVA (TRY*)
- Multi-way ANOVA

* With SATGPA25R

Two-Way Table (chi-square) [Fig 2.9]

- Statistics - Contingency tables - Two-Way table

2-5 The Graphics Menu

- All pretty intuitive (if you know the graph)
- Try with SATGPA25R
- Pie: State
- Histogram: SATM
- Boxplot by group: SATM by SEX
- Scatterplot: GPA from SATV

Changing Graphs Appearance

- Rcmdr Graphs uses defaults
- Change them in Script window
- Use commands given earlier
- Many ways to do; not terribly intuitive
- See example on next slide

Changing Graphs Defaults: Example

- Histogram of GPA (with defaults):

Hist(SATGPA25R$GPA, scale="frequency", breaks="Sturges", col="darkgray")

[copy, paste, change, Submit]

Hist(SATGPA25R$GPA, scale="frequency", breaks=4, col="black", lwd=3)

2-6 The Distributions Menu: Two Quick Examples

- Distributions

Continuous distributions

Normal distribution

Normal probabilities [insert -1.5]

- Distributions

Continuous distributions

t distribution

t probabilities [insert 1.71, df 28]

Part 3: Some Other Stuff

Supplementary, Not Essential, Brief

- 3-1 A Few Other Ways to Enter Data
- 3-2 Exporting R Results
- 3-3 Bonus: Build Your Own Functions
- 3-4 An Example of an Add-on Package
- 3-5 Keeping Up to Date
- 3-6 Going Further: Selected References

3-1 A Few Other Ways to Enter Data

- From Word, a few rules
- One space between entries
- NA for missing data
- Save as Plain text (.txt)
- Access with read.table

From Word: Example

- Sample data

Age Pop Looks

18 5 65

20 1 13

21 6 34

NA 9 60

21 7 98

- Save as APL.txt on E drive, folder R
- Read in as:

>APL = read.table(“E:/R/APL.txt”, header=T)

Checking from Word

- Do these:
- >APL
- >mean (APL)
- >mean (Pop) [gives error]
- >attach (APL)
- >mean (Pop)

From SPSS file

- Be sure you have foreign library
- Check with: > library ( ) [if needed, load]
- Activate with: > library (foreign)
- Have an SPSS file FinalData,

which we’ll put into FinalR,

using read.spss and

to.data.frame like this

>FinalR = read.spss(‘E/Project/FinalData.sav’, to.data.frame = T)

3-2 Exporting R Results

- For most intro applications, you’ll be content with output on R Console or Rcmdr Output window
- You can copy and paste to Word Hint: Use monospaced font for better alignment
- Can also save to a variety of formats from Base R or Rcmdr

Exporting Stats from Base R

- Stats to an Excel file
- R object = function(data set)
- MYMEANS = mean (SATGPA20R)
- Save MYMEANS as a .csv file

Then

- write.csv(MYMEANS, file=“exact path”)
- write.csv(MYMEANS, file=“E:/R/MYMEANS.csv”)
- Can access MYMEANS.csv with Excel
- Can read it, in R, with read.csv(MYMEANS)

Exporting Graphs from Base R

- Easy in R Graphics window and

works same for base R and Rcmdr

- Right click on the graph
- Copy as metafile (and paste wherever)
- Save as metafile (and save wherever)

Exporting from R Commander

- Easy, works much like in Word
- After running a stat,
- Go to File menu, Save output as, give file a name and destination, click Save
- Note file saved as a .txt file
- Saving graphs: Same as from Base R

3-3 Bonus: Build Your Own Functions

- You can custom-make a function and save it for future use
- Example: function to get mean of a data set + 2 times its SD

> weirdstat = function(x) mean(x) + (2*sd(x))

- Now try:

>weirdstat(GPA)

- Function names get saved like data sets and they are case sensitive

3-4 An Example of an Add-on Package

- Getting Info about Packages (need Internet)
- Take it slowly
- Go to Task Views in

http://cran.r-project.org/

- Gives categories of packages (23 now)
- Click on link for a category
- Package names: usually cryptic, often obscure
- To see what’s in a package:
- Click on its link
- Look at its Reference Manual

Installing an Add-on Package

- Follow usual steps for download
- Be sure to activate with >library(pkg)
- Download psychometric package
- Using an Add-on Package
- Basically a collection of functions
- Examples with psychometric package
- r.nil(r, n)
- rdif.nul(r1, r2, n1, n2)

3-5 Keeping Up to Date

- All parts of R (base, Commander, add-on packages) periodically updated
- Check cran-r site for updates
- Update by downloading new version

(need Internet connection for this)

3-6 Selected References

- Key URLs
- R home: http://www.r-project.org/
- Download: http://cran.r-project.org/
- For many other introductions to R:

http://cran.r-project.org/other-docs.html

References (cont)

- Some ‘Official’ books – online as pdfs
- Fox, J. (2005). Getting started with the R Commander
- R Development Core Team (2009). R Data Import/Export version 2.9.0.
- Venables, W. N., Smith, D. M., & the R Development Core Team (2009). An introduction to R. Notes on R: A programming environment for data analysis and graphics version 2.9.0.

References (cont)

- Some other books
- Dalgaard, P. (2008). Introductory statistics with R (2nd ed.). New York: Springer.
- Everitt, B. S., & Hothorn, T. (2006). A handbook of statistical analyses using R. Boca Raton, FL: Chapman and Hall.
- Murrell, P. (2005). R graphics. Boca Raton, FL: Chapman and Hall.

To cite use of R

- To cite the use of R for statistical work, R documentation recommends the following:

R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

- Get the latest citation by typing citation ( ) at the > prompt in the R Console.

Download Presentation

Connecting to Server..