170 likes | 294 Views
This guide provides a comprehensive introduction to exploratory descriptive data analysis using S-Plus. It covers essential commands for initiating S-Plus in various environments (Unix and MS-Windows), basic arithmetic operations, assignment structures, concatenation techniques, and sequence and replication commands. Additionally, it explores index brackets for data manipulation, the structure of matrices and frames, and methods for reading data. Ideal for beginners, this resource equips users with foundational skills to analyze data effectively within S-Plus.
E N D
Introduction to Exploratory Descriptive Data Analysis in S-Plus Jagdish S. Gangolly State University of New York at Albany
S-Plus in Unix & MS-Windows • To start S-Plus in Solaris/CDE: • Create a directory, say, s. • mkdir s • Go to that directory • cd s • Initialise it as a new S-Plus chapter • splus CHAPTER • Start splus • splus
S-Plus in Unix & MS-Windows • To invoke a graphics window: • Motif() • To invoke the help system (Java based): • Help.start() • To quit S-Plus shell: • Q() or Ctrl-D The S-Plus prompt is >
Simple Structures I: Arithmetic Operators • Arithmetic Operators • *, /, +, and -. • Avoid amguity by using parantheses, eg., (7+2)*3, since 7+2*3=13 and not 27. • Multiplication and division are evaluated before addition & subtraction. Raising to a power (^ or **) takes precedence over everything else.
Simple Structures II: Assignments • Assignments: X <- 3 or 3 -> x or x_3 or x=3 Not a good idea to use underscore for assignment or the equals sign. • To see the value of a variable x: X or print(x) • To remove a variable x: Rm(x)
Simple Structures III: Concatenation • Concatenation: • Used to create vectors of any length > X <- c(1.5, 2, 2.5) > X 1.5 2.0 2.5 > X^2 2.25 4.00 6.25 .c can be used with any type of data
Simple Structures IV: Sequence • Sequence command • Seq(lower, upper, increment) Some examples: seq(1,35,5):1 6 11 16 21 26 31 seq(5,15,1.5): 5 6.5 8.0 9.5 11 12.5 14.0 seq(50,25,-5): 50 45 40 35 30 25
Simple Structures V: Replicate • Replicate command: to generate data that follow a regular pattern: Some examples: rep(8,5): 8 8 8 8 8 rep(“8”, 5): “8” “8” “8” “8” “8” rep(c(0,”ab”),2):“0” “ab” “0” “ab” rep(1:4, 1:4): 1 2 2 3 3 3 4 4 4 4 Rep(1:3, rep(2,3)): 1 1 2 2 3 3 Rep(c(1,8,7),length=5)):1 8 7 1 8
Simple Structures VI: Expressions > X <- seq(2,10,2) > Y <- 1:5 > Z <- ((3*x^2+2*y)/((x+y)*(x-y)))^(0.5) > X 2 4 6 8 10 > Y 1 2 3 4 5 > Z 2.160247 2.081666 2.054805 2.041241 2.033060
Simple Structures VI: Logical Operators • < Less Than • > Greater than • <= Less than or equal to • >= Greater than or equal to • == Equal to • != Not equal to
Simple Structures VII Index Brackets: Square brackets are used to index vectors and matrices. > x <- seq(0,20,10) > x[2] 10 > x[5] NA > X[c(1,3)] 0 20 > X[-1] 10 20
Data Manipulation I: Frames & matrices I • Matrices: two-dimensional vectors (have row and column indices • Arrays: General data structure in S-Plus • Zero-dimensional: scalar • One-dimensional: vector • Two-dimensional: matrix • Three to eight-dimensional: arrays • The data in a matrix must all be of the same datatype (usually numeric datatypes)
Data Manipulation I: Frames & matrices II • The columns in dataframes can be of different datatypes • Lists: The most general datatype in S-Plus
Data Manipulation I: Matrices I • Reading data • S-Plus is very finicky about format of input data • To read a table: • Read.table(“filename”) • The first column must be rownames • The first row must be column names • The top left cell must be empty • Space/tab the default column delimiters • See the example in /db4/teach/acc522/fasb103.txt and play around with it.
Data Manipulation I: matrices II • Read.table and as.matrix(): x <- Read.table(“filename”) as.matrix(x) • Enter data directly: Matrix(data, nrow, ncol, byrow=F) Example: x <- Matrix(1:6, nrow=2, byrow=T) • dim(x): (2 X 3) • Dimnames(x): (NULL)
Data Manipulation I: matrices III • Elements of matrices are accessed by specifying the row and column indices. Example: data <- c(227,8,1.3,1534,58,1.2,2365,82,1.8) dountries <- c(“austria”, “france”, “germany”) variables <- c(“gdp”, “pop”, “inflation”) country.data <- matrix(data,nrow=3,byrow=T) dimnames(country.data)<- list(countries,variables) Country.data[1:2,2:3]:pop and inflation of austria & france
S-Plus Graphics I • To open a graphics window: motif() • You can adjust the color scheme and print options through the drop-down menu on the motif window. • To plot two variables x and y, plot(x,y) Example: (sine curve) plot(1:100, sin(1:100/10))