1 / 80

Information and Security Analytics Lecture #1 Unit #1: Data Management: Overview

Information and Security Analytics Lecture #1 Unit #1: Data Management: Overview. Dr. Bhavani Thuraisingham. May 27, 2010. Objective of the Unit.

gerek
Download Presentation

Information and Security Analytics Lecture #1 Unit #1: Data Management: Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information and Security Analytics Lecture #1 Unit #1: Data Management: Overview Dr. Bhavani Thuraisingham May 27, 2010

  2. Objective of the Unit • This unit provides an overview of the developments in data management. It also provides an overview of data management, information management and knowledge management and illustrates a framework • Reference: Data Management Systems: Evolution and Interoperation, Thuraisingham, CRC Press, 1997

  3. Outline of the Unit • What is Data Management? • Developments in Data Management • Current Status and Trends • Note on Data Administration • Data management, Information management, and Knowledge Management

  4. What is data management • One proposal: Data Management = Database System Management + Data Administration • Includes data analysis, data administration, database administration, auditing, data modeling, database system development, database application development • The tutorial will focus mainly on database system aspects of data management

  5. Developments in Database Systems Network, Hierarchical database systems Relational database systems, transaction processing, distributed database systems Heterogeneous Next generation database database systems: integration, object-oriented, Migrating legacy deductive, - - - databases data warehousing, data mining, multimedia database systems, Internet database

  6. Current Status Multimedia Database Database Systems Systems Limited integration between Sensor the different Data Database types of systems Warehousing Systems Systems Heterogeneous Data Database Mining Systems Systems Often Stovepiped by Technology

  7. Vision for Database Management

  8. Some Outstanding Problems Integration Integration Migrating Multimedia Real-time Heterogeneous with other Legacy Database Database Database Technologies Applications Management Management Integration • Data • Quality of • Distributed • Semantic • Modernization model service processing heterogeneity • Enterprise • Index • Operating • Mass • Inferencing modeling strategies system storage • Transaction • Schema • Synchronization services • Information processing transformation • Data • Transaction management • Integrity manipulation processing • Knowledge • Security • Active management databases

  9. Some Current Trends in Data Management • Heterogeneous database integration • Query, transactions, semantics, security and integrity • Migrating legacy databases • Fine-grained encapsulation, distributed objects • Multimedia databases • Query, model, quality-of-service, index • Data Warehousing • Building a warehouse, query • Data Mining • Multimedia databases, web data mining • Data management for collaboration • Architecture, transactions • Web databases and digital libraries • Query, transactions, index, security

  10. Interoperability of Heterogeneous Database Systems

  11. Note on Data Administration • Identifying the data • Data may be in files, paper, databases, etc. • Analyzing the data • Is the data of good quality? • Is the data complete? • Data standardization • Should one standardize all the data elements and metadata? • Repositories for handling semantic heterogeneity? • Data modeling • Structure the data, model the data and the processes

  12. Data, Information and Knowledge Management • Data Management • Data: stored in databases, files or some media • Data management includes modeling, storing, retrieving and anbalyzing the data • Information Management • Information is what is obtained by making sense out of the data; E.g., Data with context • Information management is about modeling, storing, retrieving and analyzing the information • Knowledge Management • Knowledge is what is obtained when the information is understood; it enables one to take actions • Knowledge management is about utilizing the knowledge to improve the business of an organization

  13. Data, Information and Knowledge Management: Alternative View: MITRE Model 1999/2000 Decision Support Knowledge Management Information Management Data Management Communication, Network, Operating System, Middleware

  14. Information and Security Analytics Lecture #1 Unit #2: Database Systems Dr. Bhavani Thuraisingham May 27, 2010

  15. Objective of the Unit • This unit will provide an overview of the concepts and developments in database systems • Reference: Data Management Systems: Evolution and Interoperation, Thuraisingham, CRC Press, 1997

  16. Outline of the Unit • Concepts in database systems • Types of database systems

  17. Concepts in Database Systems • Definition of a Database system • Early systems • Metadata • Architectural Issues • Schema, Functional • DBMS Design Issues • Other Issues • Database design, Administration

  18. Database System • Consists of database, hardware, Database Management System (DBMS), and users • Database is the repository for persistent data • Hardware consists of secondary storage volumes, processors, and main memory • DBMS handles all users’ access to the database • Users include application programmers, end users, and the Database Administrator (DBA) • Need: Reduced redundancy, avoids inconsistency, ability to share data, enforce standards, apply security restrictions, maintain integrity, balance conflicting requirements • We have used the definition of a database management system given in C. J. Date’s Book (Addison Wesley, 1990)

  19. An Example Database System Adapted from C. J. Date, Addison Wesley, 1990

  20. Early systems: Hierarchical and Network Database Systems SUPPLIERS PARTS SUPPLIERS SUPPLIES SUPPLIES SUPPLIES PARTS Network Data Model Hierarchical Data Model

  21. Metadata • Metadata describes the data in the database • Example: Database D consists of a relation EMP with attributes SS#, Name, and Salary • Metadatabase stores the metadata • Could be physically stored with the database • Metadatabase may also store constraints and administrative information • Metadata is also referred to as the schema or data dictionary

  22. Three-level Schema Architecture: Details User B2 User A1 User A2 User A3 User B1 External Schema B External Model A External Schema A External Model B External/Conceptual Mapping A External/Conceptual Mapping B Conceptual Model Conceptual Schema Conceptual/Internal Mapping Stored Database Internal Model Internal Schema

  23. Functional Architecture Data Management User Interface Manager Schema (Data Dictionary) Manager (metadata) Security/ Integrity Manager Query Manager Transaction Manager Storage Management File Manager Disk Manager

  24. DBMS Design Issues • Query Processing • Optimization techniques • Transaction Management • Techniques for concurrency control and recovery • Metadata Management • Techniques for querying and updating the metadatabase • Security/Integrity Maintenance • Techniques for processing integrity constraints and enforcing access control rules • Storage management • Access methods and index strategies for efficient access to the database

  25. Other Issues • Database design • Generally a two-step process • Semantic data model to capture the entities of the application and the relationships between the entities • Generate the conceptual schema; theory of normal forms for relational databases • Research on object-oriented approaches for database design • Database Administration • Creating and deleting databases; backup and recovery, enforcing policies, auditing, etc.

  26. Types of Database Systems • Relational Database Systems • Object Database Systems • Deductive Database Systems • Other • Real-time, Secure, Parallel, Scientific, Temporal, Wireless, Functional, Entity-Relationship, Sensor/Stream Database Systems, etc.

  27. Relational Database: Informal Overview • Collection of tables also called relations • Table has one or more columns also called attributes • Each table has zero or more rows also called tuples • Elements of a row take values from a pool of legal values • The values of one or more columns in a row uniquely identify the row. These columns form an identifier (also called key) • One identifier is designated as the unique identifier (also called primary key) • Querying relational databases using language called SQL (Structured Query Language)

  28. Relational Database: Example Relation S: S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris S4 Clark 20 London S5 Adams 30 Athens Relation P: P# PNAME COLOR WEIGHT CITY P1 Nut Red 12 London P2 Bolt Green 17 Paris P3 Screw Blue 17 Rome P4 Screw Red 14 London P5 Cam Blue 12 Paris P6 Cog Red 19 London Relation SP: S# P# QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 S2 P1 300 S2 P2 400 S3 P2 200 S4 P2 200 S4 P4 300 S4 P5 400

  29. SQL: Data Manipulation • Select, Update, Delete, Insert Examples: SELECT S.S#, S.STATUS FROM S WHERE S.CITY = Paris SELECT * FROM S SELECT S.*, P.* FROM S, P WHERE S.CITY = P.CITY UPDATE P SET COLOR = ‘Yellow’ WEIGHT = WEIGHT + 5 CITY = NULL WHERE P# = P2

  30. Features of Object-Oriented Database Systems Suitable for Advanced Applications • Objects (support for large and variable sized data blocks) • Class hierarchy (reusability) • Instance variables, composite and complex objects (complex data structures) • Methods, and message passing (object encapsulation) • Pointer swizzling (performance) • Tighter integration with programming languages (application program support) • Special mechanisms for long transactions and concurrency control, multimedia information management, schema management, versions management, storage management

  31. Concepts in Object Database Systems • Objects- every entity is an object • Example: Book, Film, Employee, Car • Class • Objects with common attributes are grouped into a class • Attributes or Instance Variables • Properties of an object class inherited by the object instances • Class Hierarchy • Parent-Child class hierarchy • Composite objects • Book object with paragraphs, sections etc. • Methods • Functions associated with a class

  32. D1 D2 J1 Example Class Hierarchy ID Name Author Publisher Document Class Method2: Method1: Print-doc(ID) Print-doc-att(ID) Journal Subclass Book Subclass Volume # # of Chapters B1

  33. Example Composite Object Composite Document Object Section 2 Object Section 1 Object Paragraph 1 Object Paragraph 2 Object

  34. Deductive Database Systems • Database systems augmented with inference engines to deduce new data from existing data and rules • Example • Rule: parent of a parent is a grandparent • Data: John is Jane’s parent; Jane is Robert’s parent • From the above, infer John is Robert’s grandparent • Loose and tight coupling architectures between the database system and inference engine

  35. Current Status • Database Systems is a mature technology; numerous products and prototypes • Much work followed in distributed and heterogeneous databases • Current directions include web database management as well as data management support for novel applications including E-commerce, Bioinformatics and Geoinformatics • Work still continues on developing new kinds of database systems including stream/sensor database systems

  36. Information and Security Analytics Lecture #1 Unit #3: Distributed and Heterogeneous Database Systems Dr. Bhavani Thuraisingham May 27, 2010

  37. Objective of the Unit • This unit provides an overview of concepts in distributed and heterogeneous databases. In particular, definitions and functions, are discussed • Reference: • Data Management Systems: Evolution and Interoperation, Thuraisingham, CRC Press, 1997 • Heterogeneous Information Exchange and Organizational Hubs, Kluwer, 2002, Editors: Bestougeff, Dubois and Thuraisingham

  38. Outline of the Unit • Distributed Database Systems • Architecture, Data Distribution, Functions • Heterogeneous Database Integration • Federated Database Management • Client-Server Database Management • Migrating Legacy Databases • Current Status and Directions

  39. A Definition of a Distributed Database System • A collection of database systems connected via a network • The software that is responsible for interconnection is a Distributed Database Management System (DDBMS) • Each DBMS executes local applications and should be involved in at least one global application (Ceri and Pelagetti) • Homogeneous environment

  40. Data- base 1 DBMS 3 Data- base 3 Distributed Processor 3 Site 3 DBMS 1 Distributed Processor 1 Communication Network Site 1 Distributed Processor 2 Data- base 2 DBMS 2 Site 2 Architecture

  41. Distributed Processor Network Interface Distributed Query/Update Processor Distributed Transaction Manager Integrity/ Security Manager Distributed Metadata Management Local DBMS Interface

  42. Data Distribution S I T E 1 E M P 1 D E P T 1 D # S S # N a m e S a l a r y D # D n a m e M G R 1 0 1 J o h n 2 0 1 0 C . S c i . J a n e 2 0 2 P a u l 3 0 2 0 3 J a m e s 4 0 3 0 E n g l i s h D a v i d 2 0 4 J i l l 5 0 4 0 F r e n c h P e t e r 1 0 6 0 5 M a r y 2 0 6 J a n e 7 0 S I T E 2 E M P 2 D E P T 2 S S # N a m e S a l a r y D # D n a m e D # M G R 9 M a t h e w 7 0 5 0 5 0 J o h n M a t h 7 D a v i d 8 0 3 0 P h y s i c s P a u l 2 0 8 P e t e r 9 0 4 0

  43. Distributed Database Functions • Distributed Query Processing • Optimization techniques across the databases • Distributed Transaction Management • Techniques for distributed concurrency control and recovery • Distributed Metadata Management • Techniques for managing the distributed metadata • Distributed Security/Integrity Maintenance • Techniques for processing integrity constraints and enforcing access control rules across the databases

  44. Query Processing Example (Concluded) DQP (Distributed Query Processor) Network DQP DQP DQP DBMS 3 DBMS 1 DBMS 2 EMP1 (20) EMP3 (50) DEPT3 (30) EMP2 (30) DEPT2 (20) EMP1 (20) Query at site 1: Join EMP and DEPT on D# Move EMP2 to site 3; Merge EMP1, EMP2, EMP3 to form EMP Move DEPT2 to site 3; Merge DEPT2 and DEPT3 to form DEPT Join EMP and DEPT; Move result to site 1

  45. Transaction Processing Example DTM (Distributed Transaction Manager) responsible for executing the distributed transaction Issues: Concurrency control Recovery Data Replication Site 1 Coordinator Transaction Tj Subtransaction Tj4 Subtransaction Tj2 Subtransaction Tj3 Site 2 Participant Site 4 Participant Site 3 Participant Two-phase commit: Coordinator queries participants whether they are ready to commit If all participants agree, then coordinator sends request for the participants to commit

  46. Interoperability of Heterogeneous Database Systems Database System A Database System B (Relational) (Object- Oriented) Network Transparent access to heterogeneous databases - both users and application programs; Query, Transaction processing Database System C (Legacy)

  47. Technical Issues on the Interoperability of Heterogeneous Database Systems • Heterogeneity with respect to data models, schema, query processing, query languages, transaction management, semantics, integrity, and security policies • Interoperability based on client-server architectures • Federated database management • Collection of cooperating, autonomous, and possibly heterogeneous component database systems, each belonging to one or more federations

  48. Different Data Models Network Node A Node B Node C Node D Database Database Database Database Network Model Object- Oriented Model Relational Model Hierarchical Model Developments: Tools for interoperability; commercial products Challenges: Global data model

  49. Schema Integration and Transformation: An approach External Schema III External Schema I External Schema II Global Schema: Integrate the generic schemas Generic schema describing the relational database Generic schema describing the network database Generic schema describing the hierarchical database Generic schema describing the object-oriented database Schema describing the network database Schema describing the relational database Schema describing the hierarchical database Schema describing the object-oriented database Challenges: Selecting appropriate generic representation; maintaining consistency during transformations; schema evolution

  50. Semantic Heterogeneity • Semantic heterogeneity occurs when there is a disagreement about the meaning or interpretation of the same data Object O Challenges: Standard definitions; Repositories Node A Node B Database Database Object O interpreted as a passenger ship Object O interpreted as a submarine

More Related