1 / 72

Chaos To Order

Chaos To Order. CAIR 2017. Integrating National Clearinghouse Data into a Data Warehouse. Office of Institutional Effectiveness Humboldt State University. Data Warehouse: Strategic Data Repository - SDR. Presenters:

jadams
Download Presentation

Chaos To Order

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chaos To Order CAIR2017 Integrating National Clearinghouse Data into a Data Warehouse

  2. Office of Institutional EffectivenessHumboldt State University Data Warehouse: Strategic Data Repository - SDR Presenters: Ward Headstrom, Data Scientist Ronda Stemach, Data Administrator Michael Le, Research Associateward.headstrom@humboldt.eduronda.stemach@humboldt.edumichael.le@humboldt.edu

  3. National Student Clearinghouse Who are They? 1 CAIR2017

  4. National Student Clearinghouse Who are They? 1 • Nonprofit founded in 1993 • Educational Reporting • Verification and Research Services • 3600+ post-secondary institutions (93%) CAIR2017

  5. National Student Clearinghouse Who are They? 1 How do they provide Value? 2 CAIR2017

  6. National Student Clearinghouse How do they provide Value? 2 CAIR2017

  7. National Student Clearinghouse Who are They? 1 How do they provide Value? 2 What questions can I answer? 3 CAIR2017

  8. National Student Clearinghouse What happens to students admitted to my University but ultimately do not attend? How many of my students who leave before graduating get a degree from another institution? What is the migration of our local high school students who choose to continue to college? What types of degrees do HSU students earn from other institutions? CAIR2017

  9. NSC Summary Report CAIR2017

  10. What happens to students admitted to my University but ultimately do not attend? CAIR2017

  11. CAIR2017

  12. CAIR2017

  13. CAIR2017

  14. CAIR2017

  15. CAIR2017

  16. CAIR2017

  17. CAIR2017

  18. CAIR2017

  19. CAIR2017

  20. CAIR2017

  21. CAIR2017

  22. CAIR2017

  23. CAIR2017

  24. Overview of Getting Student Tracker Data • Create and Upload Data Request File to NSC • Download Student Tracker data files • Add student tracker data to database • Aggregate data into a dimension that connects to one or more of our data models CAIR2017

  25. You will need • an FTP account (shared with others on your campus) • an Individual account (assigned by campus admin)

  26. Student Tracker Request File Types Our Request Files • SE - student enrollment • DA - declined admission • both accept names, birthdates, starting dates, and IDs • SE - all students who have left the university ever, using the starting date at the end the student’s last semester (62k). • DA - all undergrads who applied during the last 10 years (177k). CAIR2017

  27. Upload File Format • Fixed-width text file or tab delimited text file (Excel) • Requires a header and footer descriptive row • File layout specifications provided CAIR2017

  28. Uploading the File to FTP CAIR2017

  29. Confirmation Receipt CAIR2017

  30. Warnings? Errors? CAIR2017

  31. File Validation CAIR2017

  32. when data is ready Email Notification CAIR2017

  33. FTP text file to UNIX server Define external text file Copy data to indexed Oracle table Move the raw data into our Oracle database CAIR2017

  34. External table definition create table clearinghouse_deg ( credential varchar(85), lev varchar(2), levname varchar(25) ) organization external ( type oracle_loader default directory ujob_prod_data_dir access parameters ( records delimited by newline skip 1 fields terminated by ' ' ) location ('CREDENTIAL_LEVEL_LOOKUP_TABLE.txt') );

  35. Moving data to Oracle table truncate table clearinghousedeg_sdr; insert into clearinghousedeg_sdr select credential, lev, substr(levname,1,length(levname)-1) levname from clearinghouse_deg / CAIR2017

  36. NSC Detail Data CAIR2017

  37. Challenges posed by detail data • Missing data - students not found • Extra data - questionable enrollment records • Incorrect school codes • Confusing degree information • Many rows per student (average=9, max=216) SE - 306K records, DA - >1M records CAIR2017

  38. NSC Control Report CAIR2017

  39. students not found Missing Data • FERPA blocking or partial blocking • Schools not reporting (122 with over 1,000 students) • No match - proprietary algorithm only has name and birthdate to match on • Old data - NSC founded in 1993 • 372 schools gave degrees without prior enrollment • 193 students received degrees from HSU without enrollment • Reported enrollment data rejected because of duplicate SSN • ?? Some other reason ?? 1.3% of students we reported were not found. 1.5% of degrees we reported were missing (only .2% FERPA)

  40. Extra enrollment records • Enrollment at multiple institutions reported at the same time. L 04-JAN-10 19-MAR-10 UNIVERSITY OF OREGON F 19-JAN-10 14-MAY-10 HUMBOLDT STATE UNIVERSITY • Short term enrollment • Extended education enrollment • Enrollment reported and then withdrawn. • Student exchange - reported by home school CAIR2017

  41. Incorrect School Codes • Same school with different code in NSC data 041271-00 UNIVERSITY OF CALIFORNIA - MERCED 001313-00 UNIVERSITY OF CALIFORNIA-MERCED • Duplicate codes in CSU cross-walk table 001059 C30000040 Tuskegee U 001059 C30000047 Lawson ST Comm Col • Codes missing from the CSU cross-walk table 1 out of 7 schools missing from COSAR 04 8% of applicants go to a school with missing code CAIR2017

  42. Confusing Degree Information • 1,767 credential values in our DA detail data • downloadable file of credentials from NSC contains over 86k different values and “levels” ... ACHELOR BD Bachelors Degree ACHELOR APPLIED SCIENCE BD Bachelors Degree ACHELOR ARTS & SCIENCE BD Bachelors Degree ACHELOR HEALTH SCIENCE BD Bachelors Degree ACHELOR OF APPLIED SCIENC BD Bachelors Degree ACHELOR OF ARCHITECTURE BD Bachelors Degree ... • still over 900 credentials in DA missing from NSC file CAIR2017

  43. Credential levels from NSC LEV LEVNAME COUNT(*) AD Associate Degree 11784 BD Bachelors Degree 54064 CR Credential 452 DP Doctoral-Professional 233 DR Doctoral-Research 111 MD Masters Degree 4088 PC PostBach Certificate 16 PD Postsecondary Diploma 39 UC Certificate 4840 (missing) 929 CAIR2017

  44. My Strategy for getting degree data • put NSC credential file in our data warehouse • create view that outer joins detail data to NSC credentials • make a guess at the degree level from the degree name • use my guess when credential code cannot be looked up CAIR2017

  45. Function Definition create or replace function nscdeg (flevvarchar,fdeg varchar,f2yr varchar) return varchar2 as -- return NSC degree level if it exists or a guess from the degree name if it doesn't result varchar(6); begin if flev is null then return case when fdeg like 'A%' then '3-AA' when fdeg like 'B%' then '4-BA' when fdeg like 'M%' and fdeg not like 'MAT*' then '5-MA' when fdeg like '%DOC%' or fdeg like 'PH%' then '6-PHD' when f2yr='2' and void(fdeg) is not null then '3-AA' when void(fdeg)='N' then '2-CERT' else '0-NONE' end; else select decode(flev,'UC','2-CERT','AD','3-AA','PD','3-AA','BD','4-BA','CR','4-CRED', 'MD','5-MA','DP','6-PHD','DR','6-PHD','4-??') into result from dual; return result; end if; end; / CAIR2017

  46. View Definition create or replace view clearinghouseDA_s1 as -- did not attend clearinghouse records with base as (select rpad(date2term(search_dt)||id_,13) nscda_key, substr(college_code,1,6) colcode, substr(college_code,1,6)||college_state||jc_or_4year||college_typecolinfo, id_ id,id_ emplid, rpad(date2term(search_dt),4) term_app, case when void(begin_dt)='N' then rpad(date2term(begin_dt),4) end term, case when void(grad_dt)='N' then rpad(date2gradterm(grad_dt),4) end term_grad, substr(nscdeg(c.lev,d.degree,d.jc_or_4year),1,6) deg, rank() over (partition by id_,search_dt order by begin_dt) rnk, d.*,c.lev,c.levname from sa.clearinghouseda_sdr d, sa.clearinghousedeg_sdr c where d.degree=c.credential(+)) select b.*,c.*, case when seqno=1 then college_name end firstcol0 from base b,dv_college c where b.colcode=c.college_key(+) / CAIR2017

  47. Maximum Degrees Earned 0-NONE 98,362 2-CERT 56 3-AA 7,104 4-BA 48,460 4-CRED 392 5-MA 3,993 6-PHD 344 CAIR2017

  48. Take Away With 3,600+ schools reporting enrollment and degrees over a period of 25 years and inadequate data validation, there is bound to be bad and missing data. Use NSC data with a grain of salt. It is not completely accurate, but still can provide some useful insights. CAIR2017

  49. Addressing Volume of Data Group data by student and application term A few of the aggregate fields we derive: BA_COL – college where student first earned a Bachelors FIRSTCOL – first college a student attended MAXDEGREE – highest degree earned TERM_BA – term when student earned first bachelors TERM_FIRSTCOL – term when student first attended college CAIR2017

More Related