1 / 41

Overzicht Informatica College 9 – November 1

Overzicht Informatica College 9 – November 1. Computer Science an overview EDITION 7. J. Glenn Brookshear. C H A P T E R 8 (now chap. 9, 2 nd part). Abstractions of the actual data organization on mass storage

mckile
Download Presentation

Overzicht Informatica College 9 – November 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION7 J. Glenn Brookshear

  2. C H A P T E R8 (now chap. 9, 2nd part) • Abstractions of the actual data organization on mass storage • Again: differences between conceptual and actual data organization File Structures

  3. directory tree files 8.1: Files, Directories & the Operating System • OS storage structure: • conceptual hierarchy of directories and files

  4. 8.1: Files: Conceptual vs. Actual View • View at OS-level is conceptual • actual storage may differ significantly!

  5. 8.2: Sequential Files • To ‘remember’ where data resides on disk, the OS maintains a list of sectors for each file • Result: sequential view of scattered set of data

  6. 8.2: Text Files • Sequential file consisting of long string of encoded characters (e.g. ASCII-code) • But: character-string still interpreted by word processor! File in “Notepad” Same file in “MS Word”

  7. 8.2: Text files & Markup Languages (e.g. HTML)

  8. 8.2: From actual storage to conceptual view conceptual view Interpretation by Application Program Sequential buffer sequential view Assembly by Operating System actual storage

  9. 8.2: Data Conversion • When programming: note that data transfer to/from file may involve data conversion: • e.g., from two’s complement notation to ASCII: • So: again it’s about the interpretation of data

  10. loaded into main memory when opened • Indexing: Indexed File Index keys 8.3: Quick File Access • Disadvantage of sequential files: • no quick access to particular file data • Two techniques to overcome this problem: • (1) Indexing or (2) Hashing

  11. Opdracht: Chapter 8 - Problem 10Why is a ‘patient identification number’ a better choice for a key field than the last name of each patient? • If key unique: • additional sequential search never required • Patient’s last name is not always unique

  12. 8.3: Inverted Files • Variation to (single) indexing: inverted file

  13. How? • define set of ‘buckets’ & ‘hashfunction’ that converts keys to bucket numbers key value hash function bucket number … 0 1 2 3 … N 8.4: Hashing • Disadvantage of indexing is… the index • requires extra space + includes 1 extra indirection • Solution: ‘hashing’ • finds position in file using a key value (as in indexing)… • … simply by identifying location directly from the key

  14. Key values 8.4: Hash Function: Example • If storage space divided into 40 buckets and hash function is division: • key values 14, 54, & 94 all map onto same bucket (collision)

  15. 8.4: Key field value can be anything

  16. not fixed in size! 8.4: Handling Bucket Overflow • When bucket-sizes are fixed: • buckets can fill up and overflow • One solution: • designate special overflow storage area

  17. 101 Division: 101 / 23 = 4, remainder 9 bucket number: 9 … … 0 1 2 … 9 … 23 Opdracht: Chapter 8 - Problem 22If we use division as a hash function and have 23 buckets, in which bucket should we search to find the record whose key is interpreted as the integer value 101?

  18. Opdracht: Chapter 8 - Problem 16a) What advantage does an indexed file have over a hash file?b) What advantage does a hash file have over an indexed file? • a) When key unique: index directly points to required data, while hashing oftens require an additional (sequential) bucket search (incl. bucket overflow). • b) No additional index file storage is required.

  19. Chapter 8 - File Structures: Conclusions • File Structures: • abstractions of actual data organization on mass storage • Changes of ‘view’: • actual storage -> sequential view by OS -> conceptual view presented to user • Quick access to particular file data by • (1) indexing (many forms) • (2) hashing (requires no index, but requires bucket search!)

  20. C H A P T E R 9 • (Large) integrated collections of data that can be accessed quickly • Combination of data structures (chap. 7) and file structures (chap. 8) Database Structures

  21. 9.1: Historical Perspective • Originally: departments of large organizations stored all data separately in flat files • Problems: redundancy & inconsistencies

  22. 9.1: Integrated Database System • Better approach: integrate all data in a single system, to be accessed by all departments

  23. 9.1: Disadvantages of Data Integration • Disadvantages: • Control of access to sensitive data?! • Bijvoorbeeld: personeelszaken heeft niets te maken met persoonlijke gegevens opgeslagen door de bedrijfsarts! • Misinterpretation of integrated data • Supermarkt-database zegt dat een klant veel medicijnen koopt. Wat betekent dit? Wat als deze klant solliciteert op een baan bij de supermarkt-keten? • What about the right to hold/collect/interpret data? • Heeft een credit card company het recht gegevens over koopgedrag van personen te gebruiken/verkopen?

  24. Compare: Operating System Actual data storage Data seen in terms of a sequential view 9.2: Conceptual Database Layers

  25. 9.3: The Relational Model • Relational Model • shows data as being stored in rectangular tables, called relations, e.g.: • row in a relation is called ‘tuple’ • column in a relation is called ‘attribute’

  26. 9.3: Issues of Relational Design • So, relations make up a relational database… • … but this is not so straightforward: • Problem: more than one concept combined in single relation

  27. 9.3: Redesign by extraction of 3 concepts Any information obtained by combining information from multiple relations

  28. 9.3: Example: • Finding all departments in which employee 23Y34 has worked:

  29. 9.3: Relational Operations • Extracting information from a relational database by way of relational operations • Most important ones: • (1) extract tuples (rows) : SELECT • (2) extract attributes (columns) : PROJECT • (3) combine relations : JOIN • Such operations on relations produce other relations • so: they can be used in combination, to create complex database requests (or ‘queries’)

  30. 9.3: The SELECT operation

  31. 9.3: The PROJECT operation

  32. 9.3: The JOIN operation

  33. RESULT X.U X.V X.W Y.R Y.S A Z5 3 J A Z5 4 K C Q 5 3 J C Q 5 4 K Opdracht: Chapter 9 - Problem 10 X relation U V W A Z5 B D 3 C Q 5 Y relation R S 3 J 4 K • RESULT := PROJECT W from X SELECT from X where W=5 PROJECT S from Y JOIN X and Y where X.W > Y.R

  34. Opdracht: Chapter 9 - Problem 11 PART relation MANUFACTURER relation CompanyName PartNameCost Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01 PartName Weight Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5 • a) Which companies make Bolt 2Z? • NEW := SELECT from MANUFACTURER where PartName = Bolt2Z • RESULT := PROJECT CompanyName from NEW

  35. Opdracht: Chapter 9 - Problem 11 PART relation MANUFACTURER relation CompanyName PartNameCost Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01 PartName Weight Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5 • b) Obtain a list of the parts (+cost) made by Company X? • NEW := SELECT from MANU’ER where CompanyName=CompanyX • RESULT := PROJECT PartName, Cost from NEW

  36. Opdracht: Chapter 9 - Problem 11 PART relation MANUFACTURER relation CompanyName PartNameCost Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01 PartName Weight Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5 • c) Which companies make a part with weight 1? • NEW1 := JOIN MANUCTURER and PART where MANUFACTURER.PartName = PART.PartName • NEW2 := SELECT from NEW1 where PART.Weight = 1 • RESULT := PROJECT MANU’ER.CompanyName from NEW2

  37. Opdracht: Chapter 9 - Problem 11 PART relation MANUFACTURER relation CompanyName PartNameCost Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01 PartName Weight Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5 • c) Which companies make a part with weight 1? • NEW1 := SELECT from PART where Weight = 1 • NEW2 := JOIN MANUCTURER and NEW1 where MANUFACTURER.PartName = NEW1.PartName • RESULT := PROJECT MANU’ER.CompanyName from NEW2

  38. Chapter 9 - Database Structures: Conclusions • Database Structures: • (large) integrated collections of data that can be accessed quickly • Database Management System • provides high-level view of actual data storage (database model) • Relational Model most often used • relational operations: SELECT, PROJECT, JOIN, … • high-level language for database access: SQL

  39. Overzicht Informatica – Tentamen (1) • Most important sections (editie 7) & keywords: • Ch. 0 - 1, 3, 4:abstractie / algoritme • Ch. 1 - 1, 2, 3, 4, 5, 6, 7: bits / data opslag & representatie (ASCII, etc) / Boolse operaties / flipflops / geheugen-vormen en -karakteristieken / getalstelsels (binair, hexadecimaal, etc…) / overflow & truncation errors • Ch. 2 - 1, 2, 3, 4, 6: cpu architectuur / machine language & instructions / programma executie / machine cycle / alternatieve architecturen • Ch. 3 - 1, 2, 3, 4: operating systems / batch processing / time-sharing / multitasking / OS componenten / process vs. programma / competition • Ch. 4 - 1, 2, 3, 4, 5, 6:algoritme (formeel) / primitiven / pseudo-code / syntax / semantiek / iteratie / loop control / recursie / efficientie

  40. Overzicht Informatica – Tentamen (2) • Most important sections (editie 7) & keywords: • Ch. 5 - 1, 2, 3, 4, 5: generaties: 1e, 2e, 3e / assembly language / compilers / machine independence / paradigma’s / imperatief / object-georienteerd / programming concepts / procedures / parameters / call by value/reference • Ch. 6 - 1, 2, 3: software life cycle / ontwikkelings-fase / modulariteit / koppeling / cohesie / documentatie / complexiteits-maat voor software • Ch. 7 - 1, (2-5): datastructuren / abstractie / statisch vs. dynamisch / pointers / (arrays, lists, stacks, queues, etc…) • Ch. 8 - 1, 2, 3, 4: files / sequential / tekst / indexed / hashing • Ch. 9 - 1, 2, 3: databases vs. ‘platte’ files / relaties / tuples / attributen / relationele operaties: SELECT, PROJECT, JOIN

  41. Overzicht Informatica – Tentamen (3) • Geen tentamenstof: • Ch. 3.5 - 3.7 (editie 7) : Networks • Ch. 4 (editie 8) : Networking and the Internet • Ch. 10 (editie 7 & 8) : Artificial Intelligence • Ch. 11 (editie 7 & 8) : Theory of Computation Veel succes!

More Related