# Lecture 3 Part 1.Important Database Concepts Part 2. Queries - PowerPoint PPT Presentation

Lecture 3 Part 1.Important Database Concepts Part 2. Queries

1 / 69
Lecture 3 Part 1.Important Database Concepts Part 2. Queries

## Lecture 3 Part 1.Important Database Concepts Part 2. Queries

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Lecture 3 Part 1.Important Database Concepts Part 2. Queries Lecture slides by Austin Troy, University of Vermont, © 2008, except where noted Lecture materials by Austin Troy (c) 2008, except where noted

2. How is Data Stored? • People use number system with base 10 • Each digit corresponds to 10 to some power • Hence a number with 3 digits has 103 or 1000 possibilities • Why are computer values so often in multiples of eight? • Because computers use a base 8 system of storing numbers and values • A byte is 8 “on-off switches” or bits • Each switch/bit represents a binary number; one byte is 28 or 256 possibilities Lecture materials by Austin Troy (c) 2008, except where noted

3. How do binary numbers translate to real numbers? • Switch combinations determine base ten number based on the formula: • N10= 2b-1+2b-2+…2b-b • Where b= number of bits storing the number • Hence the binary number • 111111112 = 27*1+ 26 *1+ 25 *1+ 24 *1+ 23 *1+ 22 *1+ 21 *1+ 20 *1 = 25510 • And the binary number • 111111102 = 27*1+ 26 *1+ 25 *1+ 24 *1+ 23 *1+ 22 *1+ 21 *1+ 20 *0 = 25410 Lecture materials by Austin Troy (c) 2008, except where noted

4. Another approach to coding numbers: ASCII (American Standard Computer Info Index) Based on Hexadecimal Numbering System • 4 bit or base sixteen (24) system for representing numbers • 0-9 =0-9 but 10-15= A,B,C,D,E,F • Each digit represents up to 16 instead of 10 • So, the first digit in a two digit number xy= (16*x)+y • Hence • 21h= (16*2) +1 = 3310 = 001000012 • B2h= (16*11) + 2 =17810 = 101100102 Lecture materials by Austin Troy (c) 2008 except where noted

5. ASCII system provides standardized method for coding alphanumeric characters, and uses byte of 8 bits for each symbol. Those characters include everything you see on your keyboard and then some • Standard character set is coded as hexadecimal numbers going from zero to FF (28). • Example: Letter ‘A’ is 41h = 6510=010000012 Lecture materials by Austin Troy (c) 2008, except where noted

6. Number of Possible Values is fn of number of bits • Number of possible values for a unit of data is an exponential function of the number of switches • 28=256 eight bit data • 216=65,536 sixteen bit data • 232= 4,294,967,296 thirty two bit data Lecture materials by Austin Troy (c) 2003, except where noted

7. Number of bits determines data types • Examples of Integer data types • Byte: 28 (0 to 255) • Short Integer: 216 (ranges from –32,767 to +32,767 without decimals, the sixteenth bit determines sign) • Long Integer: 232 (+/-2.147483e+09 ) Lecture materials by Austin Troy (c) 2003, except where noted

8. Floating point data types • In this case the number can have a decimal, but the number of places is variable • With this type of number the number of bits determines not just the number of possible magnitudes but also the level of precision of the decimal, represented as number of decimal places. • Fewer bits in FP numbers can lead to rounding errors • Two types of FP number • Single Precision: Often 232 • Double Precision: Usually double the bits of single precision (i.e. 264) Lecture materials by Austin Troy (c) 2003, except where noted

9. Other data types • Currency (type of number with specific behaviors) • Date (recognizes order in dates) • String (text) • When numbers are represented as text they have no numerical properties (e.g. zip codes) • Boolean (yes, no) • Object (e.g. pictures, bits of code, behaviors, multi-media, programs) Lecture materials by Austin Troy (c) 2003, except where noted

10. Three database models • Hierarchical • Network • Relational Lecture materials by Austin Troy (c) 2003, except where noted

11. Hierarchical Database Model A one-to-many method for storing data in a database that looks like a family tree with one root and a number of branches or subdivisions. Problem: linkages in the tables must be known before Groovy 70s TV Action shows Drama Sitcoms Welcome back Kotter WKRP Dukes of Hazzard Dallas Fantasy Island CHIPs Loni Anderson Larry Hagman Gabe Kaplan Tom Wopat Eric Estrada Larry Wilcox Ricardo Montalban John Travolta Lecture materials by Austin Troy (c) 2003, except where noted

12. Hierarchical Database Model • Example where this model works well: • plant and animal taxonomies • Soil classification • Works when: classes are totally mutually exclusive • Problem with this model: • Does not work when have entities that belong to several classes or do not have mutual exclusivity • Think about the problems with Windows Explorer • Example: classifying your music collection • You may create classes like rock, jazz, classical, Latin, with folders for artists nested within • However, an artist may do rock and Latin and jazz on the same album, or one song may be a combination Lecture materials by Austin Troy (c) 2003, except where noted

13. Networked Database Model A database design for storing information by linking all records that are related with a list of “pointers.” Problem: linkages in the tables must be known before. Not adaptable to change. Action shows Drama Sitcoms Three’s company Love Boat Dukes of Hazzard Dallas Fantasy Island CHIPs NBC CBS ABC Lecture materials by Austin Troy (c) 2003, except where noted

14. Relational (Tabular) Database Model • A design used in database systems in which relationships are created between one or more flat files or tables based on the idea • that each pair of tables has a field in common, or “key”. In a relational database, the records are generally different in each table • The advantages: each table can be prepared and maintained separately, tables can remain separate until a query requires connecting, or relating them, relationships can be one to one, one to many or many to one Lecture materials by Austin Troy (c) 2003, except where noted

15. Records are the unit that the data are specific to Fields, or columns, are attribute categories Cells are where individual values of a record for a field are stored fields Headings: are the labels for the columns records cells Lecture materials by Austin Troy (c) 2003, except where noted

16. Is a field that is common to two or more flat files; allows a query to be done across multiple tables or allows two tables to be joined Flat file: professor info Flat file: course info Lecture materials by Austin Troy (c) 2003, except where noted

17. Join Tables Based on the values of a field that can be found in both tables The name of the field does not have to be the same The data type has to be the same In this case we have a one to one join; here the key is unique Key A B 1 Key C 4 1 10 2 5 1 2 Key A B C 2 20 3 1 3 4 1 10 6 3 50 2 5 2 10 JOIN 3 3 6 50 Lecture materials by Austin Troy (c) 2003, except where noted

18. Join Tables In this case we have a one to many join; here the key is not unique Key A B 1 Key C 4 1 10 2 5 1 1 Key A B C 2 20 2 1 3 4 1 10 6 2 5 1 10 JOIN 2 3 6 20 Lecture materials by Austin Troy (c) 2003, except where noted

19. Relational (Tabular) Database Model: 70s TV example • Now we can have various flat files (tables) with different record types and with various attributes specific to each record Table 1- specific to actors Table 2- specific to shows *entirely guessed at- I am not responsible for mistaken TV trivia Lecture materials by Austin Troy (c) 2003, except where noted

20. Relational (Tabular) Database Model • This allows queries that go across tables, like which CBS lead actors were born before 1951? Answer: John Travolta and Larry Wilcox It does this by combining information from the two tables, using common key fields *entirely guessed at- I am not responsible for mistaken TV trivia Lecture materials by Austin Troy (c) 2003, except where noted

21. Relational (Tabular) Database Model • Object-relational databases can contain other objects as well, like images, video clips, executable files, sounds, links Lecture materials by Austin Troy (c) 2003, except where noted

22. Relational Database: another example: property lot info One-to-one relationship Lecture materials by Austin Troy (c) 2003, except where noted

23. One-to-many relationship In this case, several people co-own the same lot, so no longer one lot, one person Lecture materials by Austin Troy (c) 2003, except where noted

24. Assuming each owner owned several parcels, we would structure the database differently One-to-many relationship Note: this table includes data pertinent only to Flores’ ownership of these properties Lecture materials by Austin Troy (c) 2003, except where noted

25. Example Here’s an example of a chart showing the relationships between flat files in a sample relational database for food suppliers* in Microsoft Access * This comes from an MS ACCESS sample database Lecture materials by Austin Troy (c) 2003, except where noted

26. Lecture materials by Austin Troy (c) 2003, except where noted * This comes from an MS ACCESS sample database

27. A real time RDBMS allows for realtime linking and embedding of tables based on common fields Here we see all the orders for product ID 3; there is no need to include product ID in that sub-table Lecture materials by Austin Troy (c) 2003, except where noted * This comes from an MS ACCESS sample database

28. Part 2. Queries Lecture materials by Austin Troy (c) 2003, except where noted

29. Queries • This is how we ask questions of the data • To ask queries, we use mathematical operators, like =, >, < • To ask queries on multiple criteria, we use logical operators, like AND and OR • Queries can simply select records or perform more advanced operations with those selections, such as make new tables, or summarize values by averages Lecture materials by Austin Troy (c) 2003, except where noted

30. Queries in Arc GIS • Arc GIS queries only select (highlight) records • When a record is selected, so is its corresponding feature • To summarize selected values, use the “statistics” function or “summarize” tool • To create new values based on a query, use the “calculate” tool. Lecture materials by Austin Troy (c) 2003, except where noted

31. Queries Here’s an example of a simple query in Arc GIS PRICE > 250000. This highlights all records (houses) in the specified layer with a sales price greater that \$250,000 Lecture materials by Austin Troy (c) 2003, except where noted

32. Queries That results in the following selection on the map Lecture materials by Austin Troy (c) 2003, except where noted

33. Queries And it also selects the corresponding records in the attribute table Lecture materials by Austin Troy (c) 2003, except where noted

34. Queries Here’s an example with a polygon layer; I’m querying for census tracts over 8000 people in population. Lecture materials by Austin Troy (c) 2003, except where noted

35. Queries: multiple criteria Now let’s add a criteria; let’s say we’re looking for big population tracts (>8,000) with a high rate of population change (> 3% annual). Note the use of the AND operator. Note also that a subset of the last selection was selected Lecture materials by Austin Troy (c) 2003, except where noted

36. Queries: Select From Set vs. New Set We did the previous selection by clicking Using the “create a new selection” method. We could have done the same thing by doing the first query (pop>8000), clicking “Apply,” then, without clearing that selection, typing in a new query for the second condition (popchng97 > 3) and choosing the “Select From Selection” method instead Lecture materials by Austin Troy (c) 2003, except where noted

37. Three query methods in Arc GIS New Selection: Creates a new query from scratch Add to Current Selection: Used when there is already a group of records/features selection; it is equivalent to the OR operator and widens the selection by introducing a criterion that is equivalent to the first one Select from Current Selection: Used when there is already a of records/features selection; selects a subset from the originally selected set; equivalent to the AND operator Lecture materials by Austin Troy (c) 2003, except where noted

38. Queries: OR operator Here’s a query where we use the OR operator to select either tracts greater than 8000 population OR with a growth rate greater than 3%; results in many more records selected; can also do the same thing by doing one query using “new selection” then another, using “add to current selection” Lecture materials by Austin Troy (c) 2003, except where noted

39. Queries: Strings Queries can also be made on text strings, but it is imperative to put the values in quotes. Here we query for both BLM and Parks and Rec land. Lecture materials by Austin Troy (c) 2003, except where noted

40. Queries: Strings and numbers String and number queries can be combined. For example, let’s say we’re looking for land for a suburban park and our criteria are that we need areas whose land use is classed as agricultural and that are bigger than 500,000 square feet. Lecture materials by Austin Troy (c) 2003, except where noted

41. Queries: Strings and numbers Results in: Lecture materials by Austin Troy (c) 2003, except where noted

42. Queries: Strings and numbers Whereas if our query asks for agricultural land use without the area criterion, we get: Lecture materials by Austin Troy (c) 2003, except where noted

43. So what can Arc GIS do with queries? • A query selects records; once selected you can: • Look at the selection • Requery the selection • Do stats on the selection • Create new fields that recategorize the selection by an an attribute field • Create new fields by doing calculations across several fields • Create a shapefile from the selection Lecture materials by Austin Troy (c) 2003, except where noted

44. Examples Let’s query high unemployment census tracts in LA Lecture materials by Austin Troy (c) 2003, except where noted

45. Now let’s do “statistics” to determine the population in those areas. Answer: almost 5 million people live in tracts with 6%+ unemployment (see Sum). We can also see that there are 844 tracts meeting that description (see Count) Right click on the heading to get this menu Lecture materials by Austin Troy (c) 2003, except where noted

46. Another thing we can do is convert the selection to a either a new shapefile or geodatabase feature class Right click and then click Data>>export data Lecture materials by Austin Troy (c) 2003, except where noted

47. Now, let’s say we wanted to prioritize inner city areas for urban redevelopment projects: • Let’s query based on unemployment and home value • Based on these we’ll create a new field that classes all tracts into High, Medium and Low priority areas • Tracts with median home value < \$100,000 and un-employment > 12% are “High” Lecture materials by Austin Troy (c) 2003, except where noted

48. To reclassify, we create a new field, “priority”, activate the field heading and use the field calculator to set all selected records to “high” Note: we must uses quotes with a text field Lecture materials by Austin Troy (c) 2003, except where noted

49. Now we would set criteria for “medium” and “low” based on unemployment and home value. These would probably be more complex queries because we’re querying for records, say, between 8 and 12% unemployment and between \$100,000 and \$150,000 median value. Note: AND is used three times, with two parenthetical clauses Lecture materials by Austin Troy (c) 2003, except where noted

50. Now, for the third class our task is easier—we just select everything that has not been selected yet. To do this we query for “priority”= ‘’ where those two marks after the equals sign are single quote marks. By putting empty quote marks, you’re querying for records with no values in them for that field. Now you’d set all those fields equal to “low.” Lecture materials by Austin Troy (c) 2003, except where noted