1 / 45

Uexplore/Dexter Tutorial

Uexplore/Dexter Tutorial. Part 2 More Advanced Topics And Exercises Rev. 5-16-07, jgb. Generate a Report Showing …. Population change (estimated) for Missouri cities (places) from 2000 to 2004. Show change and pct change sorted by change in population, descending.

mkang
Download Presentation

Uexplore/Dexter Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Uexplore/Dexter Tutorial Part 2 More Advanced Topics And Exercises Rev. 5-16-07, jgb

  2. Generate a Report Showing … • Population change (estimated) for Missouri cities (places) from 2000 to 2004. • Show change and pct change sorted by change in population, descending. • Only show cities that had growth of at least 100 people and 5% over the period. • Use variable labels (rather than names) as column headings in the report.

  3. Navigate to popests Filetype • Pop Estimates is a major category with (currently) 4 filetypes. By far the most important of these is popests. • Choose the Current version of popests (as opposed to the older estimates in popests2).

  4. Finding the Relevant Dataset • The Census Bureau does estimates at various levels, including nation, state, county and “subcounty”. • The latter includes cities (“places”) and other sub-county governmental units. • Find the relevant dataset by scanning the Datasets.html page in the popests dir.

  5. The Winner is dataset mosc04

  6. Determine SumLev Value for Filter • Turns out that this set has some non subcounty data as well (i.e. state & county summaries). • We want complete places – level 162. • Level 157 would give us place-within-county.

  7. Choose Output Format(s)

  8. Create the Filter • First row selects complete-place summaries. • 2nd row says only places with growth of 100 or more persons. • 3rd row says the change must be at least 5%.

  9. Choose Columns

  10. Section IV Important This Time

  11. Use An Option in New Sec V

  12. HTML Output

  13. Summary Log for this Query

  14. The Dexter Saved Query File

  15. Dexter Query Files • Are simple text files used to encapsulate a query (i.e. save all the specs so that the query can be rerun.) • Written to a temporary file which goes away within 48 hrs. • For now, most users cannot replay a saved query. • But authorized MCDC personnel can use these to create public queries.

  16. Invoking a Saved Query • Can be done via a URL with a parm spec as we see in the current example. • The name of a stored query can also be entered on the Dexter query form near the bottom of the page. • For techies who care: Dexter looks for the file named &query.txt in the Queries subdirectory of the &path data directory.

  17. Saved Queries • Are relatively new to Dexter and not yet fully implemented. • Have good potential for creating “virtual” data products. You save the query file that generates the report (and/or csv file) rather than the files. • We are experimenting with adding “run-time parms” and generating query front-ends to allow customizing the query. • For example, think of turning the growing cities query into one where you could specify the state rather than having it always be Missouri.

  18. Saved Queries and xsamples • We are experimenting with a new kind of documentation for using Dexter. • Sample dexter queries are documented and stored in shtml files in an xsamples dir. • These sample pages include links to let you view and/or invoke the saved query file. • See these at http://mcdc.missouri.edu/xsamples/

  19. V. Advanced Options • A new section on the Dexter input form, targeted at more sophisticated user who wants more control over output. • As with all Dexter sections, click on section header to see online documentation. • Easily ignored. Pretend it’s not there if you want. • Many of the new features are things we want to do when we build public queries.

  20. Advanced Features Example • We will use several advanced features here including data aggregation. • The dataset we shall access is the latest (thru 2004) county level estimates with components of change since 2000 for the entire country. • We have added cbsa (metropolitan and micropolitan area) codes to this dataset.

  21. Use Datasets.html Page in popests

  22. Query Specifications • We have a dataset that has county-level data but also has CBSA (core based statistical area) codes identifying the metro area. • We want to aggregate (sum up) the pop data to get cbsa-level summaries. • To further complicate matters, we have a variable, cbsatype, that tells us whether it is a Metro or Micro (-politan) area. We not only want summaries for each cbsa within state, but we also want a summary for all the cbsa’s of a type (metro/micro) within each state. • The states of interest are Illinois, Kansas and Mo. • We want HTML output in a custom style.

  23. Define Filter • County level summaries only. • Code of 99999 indicates not in a CBSA; exclude. • Choose states using the postal abbreviations.

  24. Select Columns

  25. Titles and Footnotes

  26. The Really Hard Part

  27. Aggregation Specs • Aggby: stab cbsatype cbsa indicates that you want Dexter to combine all rows that have the same value for these 3 category variables, summing all the numeric variables from these rows. • Agglvl: 2 indicates that you want summaries for the rightmost 2 aggby variables. A summary will be generated for all stab/cbsatype combinations, regardless of the value of cbsa.

  28. More Aggregation Specs • Grand Totals? – No means you do not want Dexter to add a summary row at the end of the file with totals for all rows in the entire dataset. • Means or Percents – you specify here cols. that cannot be just summed up. They have to be specially processed using something called a weighted average.

  29. Aggregation Specs 3 • Weights for Means/Pcts – this is a list of columns to correspond with the list specified just above for Means or Percents. • We are saying here that we want the program to weight the value of pctchang using the value of pop00c. Each pctchang value is multiplied by pop00c (“weighted”) prior to aggregation. • During the agg step the weighted values are summed. • In a post-agg step the sum of the weighted values is divided by the sum of the weights.

  30. Just Know That … • Whenever you are aggregating and you have a column/variable that is a percentage, you need to specify it in your Means or Percents list, and the corresponding col/var to use in the “Weights for ..” list is a variable containing the value of which it is the percentage. (We call this the “universe variable”). • E.g. if the variable is PctAsian (Asians as a pct of total persons) then the weight variable is TotPop – the total persons.

  31. If you misspell a Variable Name • When entering variable names in any of the boxes in Section V be extra careful to spell the name exactly. • Also be certain that you select the variable in Section III – you cannot aggregated by State if you have not selected State as 1 or the variables to keep. • You will not get a specific error message about this but instead it will just say that ther were no observations selected.

  32. Variables to Drop • Not important (99% of the time). • The program generates 2 extra variables, named _lvl_ and _nag_ , that occasionally may be useful. • _lvl_ indicates the summary level (in our example it would have value 1 or 2). • _nag_ keeps a count of how many rows/observations were used to form the output summary row/observation.

  33. Advanced Report Formatting • We check the option to use variable labels as column headers in the report. Not very advanced, but it did not fit elsewhere. • By variables for report allows specifying one of more variables that are listed on a separate “by line” instead of as a column. • ID variables for report (not specified here) are variables listed at the far left of each row to identify the observation (instead of having “Obs”, the observation #, used.)

  34. Style to Use for html/pdf Output • You get to pick from a menu of 14 or so. Those followed by ** are recommended. • The default (sasweb) is minimal blue & white. • In this example we chose brick, one of our favorites. • Names are not very mnemonic; you just have to try them to see what they look like.

  35. Exercise 1 • Access 2000 census long-form (sample) data for census tracts in state of Nebraska. • Create a csv file where each record corresponds to a census tract and the variables/columns tell us what metro area and county the tract is in and reports the total population and the number and percent of persons who were poor.

  36. Exercise 2 • Access filetype stf903x2 (under 1990 census data). • Create a file (sas, dbf, or Excel – whichever you prefer) that has the number of hispanics and pct hispanics for all census tracts in Greene and Christian counties (MO). Use 2000 tract geography. • Do similar query using filetype sf32000x to get comparable data for 2000. • (Not part of exercise, but we hope you will have tools to merge these 2 results).

  37. Exercises 3 • Access the beareis filetype. • Pull data for all counties in the new Jefferson City metro area (CBSA). • The id variables for your output should be county, LineCd and LineCdMeaning. • The numeric variables here are a time series; select data for 1993 and 2003. • In the Advanced Report Formatting section specify county as a by variable and linecd as an ID variable. Select electronics as the style.

  38. Exercise 4 • In the beareis filetype we have datasets that report on total transfer payments. Locate the dataset with this data for Mo. • Print a report showing total transfer payments for the state of Missouri for each of the most recently-available 10 years. • Extra credit option: specify that you want all the trf variables displayed using a dollar12. format.

  39. Exercise 5 • Navigate to the filetype georefunder the Geography/GIS major category. • Access the mocogeos dataset. (Missouri county geocodes). • Do a plain text report showing all counties in Mo, their FIPS codes & names along with the DED-Region, RPC and dot (MoDOT) region in which they are contained. • Extra credit opt: the format code $rpcname. Can be used to display names for rpc codes. Specify that you want to see rpc names instead of codes in your report.

  40. Exercise 6 • Find the 25 wealthiest counties in the United States per the 2000 census, using Median Household Income as the measure of wealth. • Print them out in descending order (highest income first) – just the top 25. • Hint: run Dexter twice, the 2nd time applying a filter based on medhhinc.

  41. Exercise 7 • Filetype sf12000x contains the standard extract of data from Summary File 1, 2000 census. • As a bonus, we added the 1990 pop for the corresponding geography to this otherwise all-2k dataset. • Access the block level dataset for Mo and generate a tab-delimited file showing the 1990 & 2000 pops along with change & % change for all blocks in Adair county.

  42. Exercise 8 • The MCDC has done many custom geographic aggregations of 2k census data -- including data for school districts and school districts/counties. • In filetype sf32000x, access dataset moschlcos. • Print a report showing total pop, rural pop and % rural pop for all the districts within St. Charles county.

  43. Want More Exercises? • Take the MCDC Trivia challenge at http://mcdc.missouri.edu/trivia/popests1.shtml • Has ten rather challenging exercises, all involving data in the popests filetype (current population estimates). • Answers included (sort of).

  44. Thank You As usual, questions and comments are encouraged. E-mail preferred: blodgettj@missouri.edu

More Related