1 / 23

CSU Data Stewardship Committee

CSU Data Stewardship Committee. Kickoff Meeting April 4, 2003. What is Data Stewardship?.

neith
Download Presentation

CSU Data Stewardship Committee

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

  2. What is Data Stewardship? • “Data stewardship is the process of managing information necessary to support program and financial managers, and assuring data captured and reported is accurate, accessible, timely, and useable for decision-making and activity monitoring.” • U.S. Department of the Interior

  3. What is Data Stewardship? (cont’d) • “Data Stewardship has, as its main objective, the management of the corporation's data assets in order to improve their reusability, accessibility, and quality. It is the Data Stewards' responsibility to approve business naming standards, develop consistent data definitions, determine data aliases, develop standard calculations and derivations, document the business rules of the corporation, monitor the quality of the data in the data warehouse, define security requirements, and so forth….” • Claudia Imhoff, Ph.D., President, Intelligent Solutions, Inc.

  4. What is Data Stewardship? (cont’d) • “Stewardship programs focus on improving data quality, reducing data duplication, formalizing accountability for data, and improving business and IT productivity.  An effective Data Stewardship program will rapidly improve the ROI from data warehousing and business intelligence efforts, application integration efforts, ERP, CRM, content and knowledge management, and EAI efforts.” • Robert Seiner, Publisher, The Data Administration Newsletter

  5. Why do we need data stewardship? • Consider the costs of poor data quality • Incorrect enrolled student counts • Incorrect flexibly scheduled course section counts • Incorrect alumni data • Why reinvent the wheel – and differently every time, at that! • Reports that claim to show the same information, but with different results • Leads to decisions based on information that is incorrect or improperly understood

  6. Data Stewardship Committee charge • The charge of the Data Stewardship Committee (DSC) is to define, validate, organize and protect data assets, thus enabling areas throughout the University to make decisions based upon high-quality, easily usable information

  7. Creating our common vision • What products should we develop? • Data marts • Data quality metrics • Metadata repository/data dictionary • Other?

  8. Creating our common vision (cont’d) • What services should we provide? • Change control • Other?

  9. What is a data mart? • “…the restriction of the data warehouse to a single business process or to a group of related business processes targeted toward a particular business group.” • Ralph Kimball, Ph.D., CEO Ralph Kimball Associates • “A data mart is a subject-specific collection of organizational data which can be used for analytical purposes relating to specific business questions or functions. A data mart contains only that data which is needed to respond to the specified business questions.” • David Fuller

  10. What is a data mart (cont’d) • Data marts are usually derived by taking many tables and “flattening” them into a few tables • Data marts are easier to query and report from

  11. What are data quality metrics? • “…there is no meaningful concept of data quality in the real world; it is only as a by-product of the deficiencies of abstracting and representing reality that data quality arises as an issue at all.” • Matt Duckham, Dept. of Computer Science, University of Keele, UK • Metrics are ways to measure data quality • How many values in a column are valid (internal consistency)? • What state is ‘ZZ’? • How many values across columns are consistent (external consistency)? • Why does ‘Ms.’ Jane Doe have a gender of ‘Male’? • Metrics show data quality improving or worsening over time • We are developing a data quality architecture (DQA) for use with a variety of data sources

  12. What is a metadata repository? • “Meta data is all physical data and knowledge-containing information about the business and technical processes, and data, used by a corporation…. While meta data repositories perform all of the functions of a data dictionary, their scope is far greater.” • David Marco, President, Enterprise Warehousing Solutions • What features might a metadata repository have?

  13. What is a metadata repository? (cont’d) • Definitions of columns and tables. • The ability to determine which tables contain a given column, or a column with a given description – e.g., which tables contain “Academic Sub-Plan”. • The ability to “query the queries” – e.g., find all existing queries with “IPEDS” in their description.

  14. What is a metadata repository? (cont’d) • The ability to determine which queries reference a given column and/or specific values of that column – e.g., which queries use employee status “L” as one of their criteria? • The ability to determine the path (menu group > panel group > panel) to follow to reach a particular panel – e.g., how do I get to the “Application Data” panel? • The ability to determine which columns from which tables appear on a given panel, and vice versa – e.g., From which table and column is “Program Action” populated in panel X, and conversely, which panels, in addition to panel X, are populated by ACAD_PROG?

  15. What is a metadata repository? (cont’d) • A metadata repository can capture data definitions, not create them • Definitions more detailed than those already stored somewhere must be provided by subject matter experts (SMEs) • The metadata repository can provide a framework for the systematic capture and publication of this metadata

  16. What is change control? • In this context, it means controlling certain changes to the data • Adding new values to critical columns • Any other changes that can impact reporting • These changes should be brought to the attention of this committee before they are made • Data users can assess and discuss the impact • The changes can be published before they are made • Users can modify reports as needed before they risk publishing incorrect information • But how do we define critical data?

  17. Identifying Critical Data • Data stewardship over all CSU data is not cost-effective • The prerequisite to • developing data marts • implementing a data quality architecture • developing a metadata repository/data dictionary • and instituting change control over our data is identification of the critical data over which we will maintain stewardship

  18. University Data • “University data are institutional assets and are held by the university to support its fundamental instructional, research, and public service missions.” • Arizona State University

  19. University Data (cont’d) • “UNIVERSITY INFORMATION -- A data element is considered UNIVERSITY INFORMATION if it provides support to and meets the needs of units of the University. Examples of UNIVERSITY INFORMATION include, but are not limited to, many of the elements supporting financial management, student curricula, payroll, personnel management, and capital equipment inventory. Data may be considered UNIVERSITY INFORMATION if it satisfies one or more of the following criteria A. It is used for planning, managing, reporting, or auditing a major administrative function;B. It is referenced or used by an organizational unit to conduct University business;C. It is included in an official University administrative report;D. It is used to derive an element that meets the criteria above. …Data that may be managed locally may yet have significant impact if it is used in a manner that can impact University operations….“ • Georgia State University

  20. University Data (cont’d) • No one owns University data but the University • We may be data stewards, custodians, users, producers, etc., but we are not owners of the data

  21. Tasks • Identifying University Data • Which columns in which tables (or which fields on which panels) do we need to • Define? • Use in reporting / analysis / decision-making? • Quality assure? • Exercise change control over? • What are our relative priorities? • Data marts • Data quality • Data dictionary • Other products?

  22. Future possibilities • Statistical analysis of data • Data mining • The “diapers and beer” discovery • Registration patterns • Student attrition patterns • Prerequisites • Better data structure • Better data quality

  23. Thank You!

More Related