1 / 25

Organizing and modelling data

Organizing and modelling data. From Information Technology Group www.wageningenur.nl/inf. Sjoukje Osinga. Gert Jan Hofstede Teacher Course Data Management, INF-21306. Why manage data?. The organization that loses its memory, loses its life Data to manage are everywhere!

niabi
Download Presentation

Organizing and modelling data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Organizing and modelling data From Information Technology Group www.wageningenur.nl/inf Sjoukje Osinga Gert Jan Hofstede Teacher Course DataManagement, INF-21306

  2. Why manage data? • The organization that loses its memory, loses its life • Data to manage are everywhere! • Experimental data, model inputs, model outputs… • ..but can all this be managed? • most of it just grows unmanaged • some of it is managed with spreadsheets or databases

  3. Data Management Course topics: The place of data management Why manage data? What is a database? Database design (week 2) Advanced SQL (week 3) Architectures (week 4) Managing (week 5) .... and some additional topics

  4. Project mgmt Data mgmt IT mgmt (This course) The place of data management • Manage: • personnel, finance, equipment, information • In an organization’s information system you have • People • Procedures • Data sets • Software • Hardware

  5. Why model data? • Research: • results are hidden in piles of paper • data files lack documentation • costly or impossible to use existing data • Management: • redundancy leads to errors • data structures are stable over time • A good design saves programming cost

  6. Database design in research Have a research question • Try out; • Think and rethink; • Design ‘real’ datamodel; • Collect data; • Query & Interpret data (Write article)

  7. What is a database? • Theoretically: • a coherent collection of data • searchable as one whole • by many people • In practice: • a collection of related 2-dim tables • rows are “things” • columns are “attributes” • special software “DBMS” is needed

  8. column A database table row

  9. Database: more than tables.The fact that one employee can be another’s boss: a one-to-many relationship

  10. Two tables- the usual caseof 1 to many -

  11. The same tables: data structure (metadata)

  12. (This course) Databases only work if they... • are actually stored into the computer (procedures) • can be accessed (availability) • can be understood (meta-data) • are the right data to look at (design) • are properly looked at (query) • …

  13. Problems • redundancy • poor control of data, compared to money, machines, personnel. • poor interface • gap with real world needs • no integration (any examples known to you?)

  14. Solutions? • Improving the organization • people • procedures (a.o.information management) • communication • Improving technology • a.o.database

  15. What is database design? • Finding out • Which facts you wish the db to remember • about which things (  entity types, tables) • what data to keep about those things (  attributes) • how the facts link the data together (  relationships) • Not • process, data flow • experimental design

  16. Database and world • “A field has one or more facets” • what counts as a field, or facet? • who says so? • Agreeing on definitions is a prerequisite! • E.g. Mars orbiter: inches vs metric... Rice, Bhutan

  17. Data modelling exercise You are in charge of designing a database to find out which teachers give which lectures where and when in your course programme. This is the main ‘fact type’ you need to store. Find out which entities are important. Find key attributes. Draw an Entity-Relationship diagram to show the structure.

  18. Possible course data model E-R diagram occurs as includes Legend: according to Hofstede is scheduled in delivers What if several rooms per course-instance?

  19. Datatype or data value? (cf p. 184) e.g. Measurement data. (a) POINT (x,y, date1, a, b, date2, a, b, c) or (b) POINT (x,y) MEASUREMENT (x,y,date, a, b, c) or (c) POINT (x,y) MEASUREMENT (x,y,date,type, value) See: Hofstede (2002) Databases modelleren, bouwen en gebruiken Where are a, b, and c?

  20. Gert Jan Hofstede - Data Management Law of no escape from trouble • There is no escaping from choosing. • Data types (columns) vs data (rows): design issue! (a) efficient but no measurements can be added (b) measurements can be added but not new ones (c) flexible, extensible but not efficient

  21. Ch 10 User program User program User program Data dictionary storage storage SQL, Structured Query Language • One language for all 3 levels of database architecture: • regulate user level (grant, revoke) • create data (create, drop, alter) • regulate storage (create index, tablespace…) • see data or metadata (select)

  22. SQL • format of a select statement (‘query’): select < what you want > from < where it is stored > [ where < some conditions apply > ]; • e.g. selectitemnamefromqsale; • also used from within Java, PHP…

  23. Other issues / good practices • Always save structured data also in a raw format that can be ‘read’ without the software. So not only .xls, but also .csv • This is harder for data in databases – save all tables also as .txt when possible • When all else fails, you can still import the text version into new structures with new software • Can another person use + understand your work?

  24. “A butterfly’s wing can change the world” When you run a simulation model: • Simulation software changes • Hardware changes (e.g. 16->32 bits) • Especially relevant for random generators Can you still reproduce your results? • Short term: Always store results + model version • Longer term: Save random seed (and algorithm) • Forever: impossible.

  25. “Communicate with the future” ! (EU funded project called ‘Shaman’)

More Related