Tables,
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Tables, Long data, Frequency weighting, String data PowerPoint PPT Presentation


  • 44 Views
  • Uploaded on
  • Presentation posted in: General

Tables, Long data, Frequency weighting, String data. Key points from our first session:. Where are you? cd Navigate to a directory cd Change from your current directory cd. What files are in the current directory? dir. How do you store commands?do file

Download Presentation

Tables, Long data, Frequency weighting, String data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Tables long data frequency weighting string data

Tables,

Long data,

Frequency weighting,

String data


Tables long data frequency weighting string data

Key points from our first session:

Where are you?cd

Navigate to a directorycd

Change from your current directorycd

What files are in the current directory?dir

How do you store commands?do file

Where is your do file?with the data

How do you create a do file?do editor

How do you save a do file?do editor … save

Name your variables to clarify their purpose

Label the values of your categorical variables to clarify their significance

Use the use command with the clear option to load a dataset

Use the save command with the replace option to save a dataset


Tables long data frequency weighting string data

Key points from our first session (cont.):

Use the data editor to check your variables

Use the do-file editor to edit, or run do files

Data editor

Do-file editor

Variables

Passed commands

Results


Tables long data frequency weighting string data

Do-file editor with a small do file opened and displayed


Tables long data frequency weighting string data

Tables

Stata supports the following table display commands

table,

tabdisp,

tabstat,

tabsum,

tabulate

…. to capture trends and patterns in data

We will again explore each of these interactively


Tables long data frequency weighting string data

USE tabulate (tab) to display

simple summaries,

relative cell counts,

associative measures between row and column variables

USE table to display

statistics concerning variables in a table and

to gain control of the table layout, for publication purposes

USE tabstat to display

the (same comprehensive array of) statistics of

(several) variables in a table and

to gain control of the table layout, for publication purposes,

when only one classifying (categorical) variable is

considered

USE tabdisp to display

your own calculations, and observations, in a table

in which you do not want any special treatment of the

numbers


Tables long data frequency weighting string data

Rosner’s dataset, ‘fev.dat’ relates to forced expiration volumes of children of

different age, sex, height, to smoking exposure. We’ll ‘use’ the data, ‘label’

the variables, and make a ‘table’ … what’s in this table?:

use fev, clear

summarize

label def slabel 1 Male 0 Female

label def mlabel 1 Smoker 0 "Non-smoker"

label val smoke mlabel

label val sex slabel

table smoke sex, c(mean age p50 age) row col format(%7.2f)

Unfortunately age is an interval variable so we’ll make a categorical variable

from it and explore the consequences also using tables.

xtile q_age=age, n(5)

table sex q_age, c(m age p50 age freq) format(%7.2f)

table q_age smoke sex, format(%7.2f) row col scol

We’ll now see what ‘tabstat’ can provide for us with this data

tabstat age, by(sex) s(mean p50 range sd sem) format(%7.2f)

Is there an association between age-group, and smoking exposure?

tabulate smoke q_age, chi ex row col


Tables long data frequency weighting string data

Output from some of the Table commands

. xtile q_age=age, n(5)

. table sex q_age, c(m age p50 age freq) format(%7.2f)

---------------------------------------------

| 5 quantiles of age

sex | 1 2 3 4 5

----------+----------------------------------

Female | 6.83 9.00 10.57 12.00 14.42

| 7.00 9.00 11.00 12.00 14.00

| 111 44 79 29 55

|

Male | 6.77 9.00 10.49 12.00 14.68

| 7.00 9.00 10.00 12.00 14.00

| 104 50 92 28 62

---------------------------------------------

. tabstat age, by(sex) s(mean p50 range sd sem) format(%7.2f)

Summary for variables: age

by categories of: sex

sex | mean p50 range sd se(mean)

-------+--------------------------------------------------

Female | 9.84 10.00 16.00 2.93 0.16

Male | 10.01 10.00 16.00 2.98 0.16

-------+--------------------------------------------------

Total | 9.93 10.00 16.00 2.95 0.12

----------------------------------------------------------


Tables long data frequency weighting string data

. table q_age smoke sex, format(%7.2f) row col scol

------------------------------------------------------------------------------------

5 | sex and smoke

quantiles | ------------- Female ------------- -------------- Male --------------

of age | Non-smoker Smoker Total Non-smoker Smoker Total

----------+-------------------------------------------------------------------------

1 | 111 111 104 104

2 | 44 44 49 1 50

3 | 69 10 79 88 4 92

4 | 25 4 29 25 3 28

5 | 30 25 55 44 18 62

|

Total | 279 39 318 310 26 336

------------------------------------------------------------------------------------

----------------------------------------------

5 | sex and smoke

quantiles | -------------- Total -------------

of age | Non-smoker Smoker Total

----------+-----------------------------------

1 | 215 215

... 2 | 93 1 94

3 | 157 14 171

4 | 50 7 57

5 | 74 43 117

|

Total | 589 65 654

----------------------------------------------


Tables long data frequency weighting string data

Long data … and Wide data

Long data:

Involves multiple instances (data rows) of each subject

Is a natural organization of data for statistical analysis

Involves variables with a single purpose

Is usually an inefficient organization for data entry

Wide data:

Is compact

Involves a single instance (data row) of each subject

Is very efficient for data entry

Can easily exposes patterns in the data at a glance

Is NOT a natural organization for data analysis

It is extremely easy to convert long data to wide data and vice versa.

This will be the topic of a future class


Tables long data frequency weighting string data

. use "long data", clear

. describe

Contains data from long data.dta

obs: 532

vars: 5 27 Jan 2004 17:04

size: 9,044 (99.9% of memory free)

----------------------------------------------------------------

storage display value

variable name type format label variable label

----------------------------------------------------------------

time int %8.0g

subject int %8.0g

day byte %8.0g Day

glucose float %9.0g Glucose

insulin float %9.0g Insulin

----------------------------------------------------------------

Sorted by:

. table subject day

----------------------

| Day

subject | -10 28

----------+-----------

2474 | 14 14

2511 | 14 14

2542 | 14 14

2636 | 14 14

2638 | 14 14

2681 | 14 14

2687 | 14 14

2744 | 14 14

2805 | 14 14

2818 | 14 14

2822 | 14 14

2832 | 14 14

2885 | 14 14

2904 | 14 14

2927 | 14 14

2935 | 14 14

2939 | 14 14

2976 | 14 14

2994 | 14 14

----------------------

Here we ‘describe’ a long

dataset involving the application

of the FSIGT on a number of

subjects; each subject received

2 diets (treatment)

Tables are probably the best way

of describing, or understanding

the nature of a long dataset

Here it appears that there are

NOT the same number of

replicates across treatments

by subjects


Tables long data frequency weighting string data

collapse -ing the data to means provides a convenient way of getting the

data into a more regular form

collapse glucose insulin, by(subject time day)

Now, using tables and graphs we can review patterns in our long data

table subject day

scatter glucose time, c(L L) s(i i) sort(subject time) by(day)

In order to widen our data with reshape we can’t have –ve times

recode time –15=1500

recode day -10=10

To make the long data wide we perform 2 reshapes, the first appends

treatment to the variable names, and the second appends time to the

name.

reshape wide glucose insulin, i(time subject) j(day)

reshape wide gluc* ins*, i(subject) j(time)


Tables long data frequency weighting string data

glucose1030 float %9.0g 30 glucose10

insulin1030 float %9.0g 30 insulin10

glucose2830 float %9.0g 30 glucose28

insulin2830 float %9.0g 30 insulin28

glucose1045 float %9.0g 45 glucose10

insulin1045 float %9.0g 45 insulin10

glucose2845 float %9.0g 45 glucose28

insulin2845 float %9.0g 45 insulin28

glucose1060 float %9.0g 60 glucose10

insulin1060 float %9.0g 60 insulin10

glucose2860 float %9.0g 60 glucose28

insulin2860 float %9.0g 60 insulin28

glucose1090 float %9.0g 90 glucose10

insulin1090 float %9.0g 90 insulin10

glucose2890 float %9.0g 90 glucose28

insulin2890 float %9.0g 90 insulin28

glucose10120 float %9.0g 120 glucose10

insulin10120 float %9.0g 120 insulin10

glucose28120 float %9.0g 120 glucose28

insulin28120 float %9.0g 120 insulin28

glucose10150 float %9.0g 150 glucose10

insulin10150 float %9.0g 150 insulin10

glucose28150 float %9.0g 150 glucose28

insulin28150 float %9.0g 150 insulin28

glucose10180 float %9.0g 180 glucose10

insulin10180 float %9.0g 180 insulin10

glucose28180 float %9.0g 180 glucose28

insulin28180 float %9.0g 180 insulin28

glucose101500 float %9.0g 1500 glucose10

insulin101500 float %9.0g 1500 insulin10

glucose281500 float %9.0g 1500 glucose28

insulin281500 float %9.0g 1500 insulin28

------------------------------------------------------------

Sorted by: subject

Now our wide data has the following

characteristics

. describe

Contains data

obs: 19

vars: 57

size: 4,370 (99.9% of memory free)

------------------------------------------------------------

storage display value

variable name type format label variable label

------------------------------------------------------------

subject int %8.0g

glucose100 float %9.0g 0 glucose10

insulin100 float %9.0g 0 insulin10

glucose280 float %9.0g 0 glucose28

insulin280 float %9.0g 0 insulin28

glucose105 float %9.0g 5 glucose10

insulin105 float %9.0g 5 insulin10

glucose285 float %9.0g 5 glucose28

insulin285 float %9.0g 5 insulin28

glucose1010 float %9.0g 10 glucose10

insulin1010 float %9.0g 10 insulin10

glucose2810 float %9.0g 10 glucose28

insulin2810 float %9.0g 10 insulin28

glucose1015 float %9.0g 15 glucose10

insulin1015 float %9.0g 15 insulin10

glucose2815 float %9.0g 15 glucose28

insulin2815 float %9.0g 15 insulin28

glucose1020 float %9.0g 20 glucose10

insulin1020 float %9.0g 20 insulin10

glucose2820 float %9.0g 20 glucose28

insulin2820 float %9.0g 20 insulin28

glucose1025 float %9.0g 25 glucose10

insulin1025 float %9.0g 25 insulin10

glucose2825 float %9.0g 25 glucose28

insulin2825 float %9.0g 25 insulin28


Tables long data frequency weighting string data

Description of the long data organization

. describe

Contains data

obs: 532

vars: 5 27 Jan 2004 17:04

size: 9,044 (99.9% of memory free)

---------------------------------------------------------------

Description of the wide data organization

Contains data

obs: 19

vars: 57

size: 4,370 (99.9% of memory free)

---------------------------------------------------------------


Tables long data frequency weighting string data

Another Rosner dataset relates to the use of two blood pressure lowering drugs

Nifedipine and Propanolol in a heart rate lowering experiment, see Rosner

(dataset descriptions)

. use nifed, clear

. describe

Contains data from nifed.dta

obs: 34

vars: 10 26 Nov 1997 17:13

size: 748 (99.9% of memory free)

-------------------------------------------------------------------

storage display value

variable name type format label variable label

-------------------------------------------------------------------

id byte %9.0g

grp str1 %9s

bhr int %9.0g

hr1 int %9.0g

hr2 int %9.0g

hr3 int %9.0g

bsp int %9.0g

sbp1 int %9.0g

sbp2 int %9.0g

sbp3 int %9.0g

-------------------------------------------------------------------

. list

+-------------------------------------------------------------+

| id grp bhr hr1 hr2 hr3 bsp sbp1 sbp2 sbp3 |

|-------------------------------------------------------------|

1. | 1 P 60 70 64 128 110 120 . . |

2. | 2 N 52 64 98 180 156 160 140 . |

3. | 3 P 100 94 190 140 . . . . |

4. | 4 N 84 88 96 112 136 126 122 110 |

5. | 5 P 56 70 61 64 230 150 130 150 |

|-------------------------------------------------------------|

6. | 6 P 105 120 142 150 . . . . |

....

Here we use, describe,

and list the data in Rosner’s

nifed dataset.


Tables long data frequency weighting string data

. rename bhr hr0

. rename bsp sbp0

. reshape long sbp hr, i(id grp) j(rep)

(note: j = 0 1 2 3)

Data wide -> long

----------------------------------------------------

Number of obs. 34 -> 136

Number of variables 10 -> 5

j variable (4 values) -> rep

xij variables:

sbp0 sbp1 ... sbp3 -> sbp

hr0 hr1 ... hr3 -> hr

----------------------------------------------------

Following the sequence of commands

shown here we create a long version

of this data

Here we describe the data, now in long form

. describe

Contains data

obs: 136

vars: 5

size: 1,496 (99.9% of memory free)

------------------------------------------------------------

storage display value

variable name type format label variable label

------------------------------------------------------------

id byte %9.0g

grp str1 %9s

rep byte %9.0g

hr int %9.0g

sbp int %9.0g

------------------------------------------------------------


Tables long data frequency weighting string data

P

P

N

P

P

N

N

N

#d ;

collapse hr sbp, by(rep grp);

scatter sbp hr rep, c(L L) mlabel(grp grp)

sort(grp rep) clw(thick thick) clp(dash dot)

title("Nifedipine - Propanalol Study")

xtitle(Replication Number) ylabel(,angle(0));

Here we illustrate the drug

effects with application wrt

heart rate and blood pressure

Nifedipine - Propanalol Study

160

140

N

N

120

P

100

P

80

P

N

P

N

0

1

2

3

Replication Number

(mean) sbp

(mean) hr


Tables long data frequency weighting string data

Frequency weighting

. use "CHD Example", clear

. label define blabel 0 "< 160" 1 ">=160"

. label define clabel 0 " No CHD" 1 CHD

. label value Blood_pressure blabel

. label value CHD clabel

. label var count Count

. list

+---------------------------+

| Blood_~e CHD count |

|---------------------------|

1. | >=160 CHD 95 |

2. | < 160 CHD 173 |

3. | >=160 No CHD 201 |

4. | < 160 No CHD 894 |

+---------------------------+

. Summarize Blood_pressure CHD

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

Blood_pres~e | 4 .5 .5773503 0 1

CHD | 4 .5 .5773503 0 1

. summarize Blood_pressure CHD [fwe=count]

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

Blood_pres~e | 1363 .217168 .4124693 0 1

CHD | 1363 .1966251 .3975923 0 1

Why do summarize and frequency

weighted summarize produce

different results?


Tables long data frequency weighting string data

Frequency weighting ….

Unweighted:

mean x = Sum of [x’s] /Number of [x’s]

for CHD= 2/4

= 0.5

Weighted:

mean x = Sum of [x’s*frequencies] /Sum of [frequencies]

for CHD= [1*95 + 1*173 + 0*201 + 0*894]/[95 + 173 + 201 + 894]

= . di [95 + 173]/[95 + 173 + 201 + 894]

= 0.19662509

What would we expect for Blood_pressure


Tables long data frequency weighting string data

Frequency weighting

In mathematical formalism we write

Here

xi is the ith value of x, and

fi is the frequency that the ith value of x occurs in the dataset


Tables long data frequency weighting string data

String data

Strings are sequences of letters and numbers

“men”, “women”, “survived”, “lost to follow-up”, “Jan:27:2004”

A sequence of numbers can be a string depending on the circumstances

“1.23”, “-200.99”, “2”

No numerical manipulation can be performed on a string

It is quite simple to:

Recognize stringscodebook, describe

Convert strings to numericsencode, real, destring

Convert numerics to stringsdecode, string

Use strings in a numeric sensexi


Tables long data frequency weighting string data

Create a string form of CHD, dCHD, and

examine the new variable

. decode CHD, gen(dCHD)

. summarize dCHD

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

dCHD | 0

. summarize dCHD [fwe=count]

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

dCHD | 0

. describe dCHD

storage display value

variable name type format label variable label

----------------------------------------------------------------------

dCHD str7 %9s

. codebook dCHD

----------------------------------------------------------------------

dCHD

----------------------------------------------------------------------

type: string (str7)

unique values: 2 missing "": 0/4

tabulation: Freq. Value

2 "No CHD"

2 "CHD"

warning: variable has leading and embedded blanks


Tables long data frequency weighting string data

The ‘xi’ modifier creates a unique series of indicator variables which, in a

sense, allow the categoric variable to be ‘numerically’ manipulated

. xi: summarize i.dCHD [fwe=count]

i.dCHD _IdCHD_1-2 (_IdCHD_1 for dCHD==No CHD omitted)

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

_IdCHD_2 | 1363 .1966251 .3975923 0 1

. list

+------------------------------------------------+

| Blood_~e CHD count dCHD _IdCHD_2 |

|------------------------------------------------|

1. | >=160 CHD 95 CHD 1 |

2. | < 160 CHD 173 CHD 1 |

3. | >=160 No CHD 201 No CHD 0 |

4. | < 160 No CHD 894 No CHD 0 |

+------------------------------------------------+


Tables long data frequency weighting string data

Using the ‘char’ utility we can control the ‘referent’ state of a categoric variable

. char dCHD [omit] "CHD"

. char list

dCHD[omit]: CHD

. list

+------------------------------------------------+

| Blood_~e CHD count dCHD _IdCHD_2 |

|------------------------------------------------|

1. | >=160 CHD 95 CHD 1 |

2. | < 160 CHD 173 CHD 1 |

3. | >=160 No CHD 201 No CHD 0 |

4. | < 160 No CHD 894 No CHD 0 |

+------------------------------------------------+

. xi: summarize i.dCHD [fwe=count]

i.dCHD _IdCHD_1-2 (_IdCHD_2 for dCHD==CHD omitted)

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

_IdCHD_1 | 1363 .8033749 .3975923 0 1

Note that we now computer the complimentary probability with the

frequency weighted summarize


Tables long data frequency weighting string data

To restore the omitted state of a categoric variable to its default state

. char list

_IdCHD_2

. char dCHD [omit]

. char list

_IdCHD_1


  • Login