1 / 25

Creating and Using Attribute Databases

Creating and Using Attribute Databases. In this lesson you will learn: concept of the attribute database as a table database elements: variables, observations, data, labels, data dictionary, aliases, indexes data types and formats basic database operations attribute queries

Download Presentation

Creating and Using Attribute Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Creating and Using Attribute Databases • In this lesson you will learn: • concept of the attribute database as a table • database elements: variables, observations, data, labels, data dictionary, aliases, indexes • data types and formats • basic database operations • attribute queries • attribute statistics • attribute data graphs

  2. The attribute database as a table

  3. The attribute database as a table

  4. Database elements Portion of the data dictionary for the Illinois Historic Tornado database.

  5. Creating the database • Steps in creating the attribute database • identify the attributes to be captured • create attribute columns for each attribute; label each column • specify the data type for each attribute • specify validation rules for each attribute • specify the data format for each attribute

  6. Data types

  7. Data formats

  8. Common database “exchange” formats Tabular database formats

  9. Tabular database formats Comma-delimited text (filename.csv) Tab-delimited text (filename.txt) Fixed-width text (filename.txt)

  10. Basic database operations Data maintenance • data entry & editing • sorts • queries • data statistics • data graphs Data segmentation Data verification

  11. add/delete observations • add/delete attribute fields • edit data • spell check (text & memo fields) • find/replace • re-enter data • append new observations • restructure attribute data • calculate new field based on existing fields • modify format • change data type Basic database operations: data maintenance

  12. Basic database operations: sorts Single-column sort Multi-column sort

  13. ii ii Basic database operations: hierarchical sorts (1) (2) (3) Hierarchical sort: column 1 (ascending); column 2 (ascending); column 3 (descending)

  14. Simple attribute queries

  15. residential transportation & utilities commercial parks & open space industrial Simple attribute queries Land-use percentage, by city ward

  16. Compound attribute queries The contingency table view of compound attributes

  17. Multi-attribute queries

  18. “NOT Black” horses The set of horses ≥ 5 yrs old The set of Black horses The set of Black horses Operator Set action Logic Outcome NOT set complement Logical converse of the operand. AND intersection of two sets True if both operands are true, false otherwise. OR union of two sets True if either 1st or 2nd operand is true, or if both are true. False if both operands are false. XOR union less intersectionTrue if 1st operand is true or 2nd operand is true. False if both are true or both are false. Compound statements are written in the form: operand-1LOGICAL OPERATORoperand-2; i.e., horse = black AND horse = 5 years of age or older Multi-attribute queries The set of all horses

  19. Data statistics

  20. mean Measures of central tendency • Median: center point of a data distribution • exactly 50% of the observations have a data value < the median and 50% have a data value > the median • Mean: the average data value = 1/n × Σ (all data values) • the mean = the median only if the data are unimodal and symmetrically distributed about the mean

  21. Measures of dispersion • Range: the span, or extent of data values • range = maximum data value – minimum data value • Variance: average squared distance of all observations from the mean • Standard Deviation: the square root of the variance, interpreted as the average distance of all observations away from the mean. • for a unimodal symmetric distribution, approximately 68% of all data values will lie within one standard deviation of the mean and 95.4% within 2 standard deviations of the mean

  22. Data graphs

  23. Data graphs for visualizing the distribution of data Box-whisker plot Quantile-Quantile plot, with Normal distribution reference line Density plot Histogram with density plot (Normal distribution)

  24. Data graphs for visualizing data relations A bivariate scatterplot illustrating the relationship between soil Calcium and Cation Exchange Capacity in a northern Illinois soil.

  25. In this lesson you learned: • Tabular databases are organized as tables, with rows as observations, columns as attributes, and the data or information contained inside the table. It may also contain indexes, a data dictionary, and aliases. • The data dictionary is vital to the proper interpretation and use of data. It should contain a description of each attribute’s measurement scale, how it was measured, when and where it was collected, by whom, and for what purpose. • Database design includes: which attributes and how they are labeled, what data type to use for each attribute, data validation rules, and data storage format. • Basic data types include text string or memo for text or qualitative information, and integer, decimal, and byte for numeric or quantitative information. • Tabular databases can be created in database, spreadsheet, statistical analysis and other software and exchanged in standard database, spreadsheet, ODBC, and formatted text file formats. • Nearly all database software has functional capabilities for data entry and editing, sorts, queries, data statistics, and data graphs. • Save a copy of your database before performing any maintenance or segmentation! Be especially careful with editing operations involving find/replace, and any operation that changes data formats or type. • Single- and multi-column sorts are useful for isolating more obvious data errors and as a starting point for segmenting the data into smaller databases, classifying observations, and creating indexes. • Query operations can take the form of find queries, filter queries or subset queries, of which only the last effects permanent change to the content of the database. • Compound queries utilize the logical operators NOT, AND, OR and XOR to join query operands. • Measures of central tendency, measures of dispersion, data distribution graphs, and scatterplots are often useful in data verification, but their greatest value is in data segmentation. What you have learned

More Related