1 / 11

CSE 232A Graduate Database Systems

This review discusses issues related to integrating databases, such as information extraction, schema alignment, entity matching, and data cleaning. It also explores different approaches for schema alignment and the need for data cleaning in a specific dataset. Additionally, it covers the conflicts and recoverability conditions in transaction schedules.

delbertt
Download Presentation

CSE 232A Graduate Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Review Discussion 2 CSE 232AGraduate Database Systems Arun Kumar

  2. Review Question Movies Movies Directors You are tasked with integrating the above two databases. Which of the following issues need not be tackled? A Information extraction B Schema alignment C Entity matching D Data cleaning

  3. Review Question Perform information extraction (IE) on this piece of text to populate the given relation schema without any errors. CSE 232A was taught by Arun Kumar in Fall 2018. It was taught by Yannis Papakonstantinou in Spring that year. Victor Vianu did not teach CSE 232A in Fall 2018 but rather CSE 132A. CourseInfo What type of IE is the above? Closed-world? Closed? Open?

  4. Review Question Which of the following schema alignment approaches offer the most flexibility/ease of use for adding more data sources? A Global-As-View (GAV) B Local-As-View (LAV) C Manual schema alignment D All of the above

  5. Review Question S1 S2 Which of the following is a mediated schema for LAV? A B C D

  6. Review Question Which stage of a typical entity matching workflow helps avoid comparison of all pairs of records? A Blocking B Pairwise check with ML classifier C Clustering D None of the above

  7. Review Question Which attribute in this dataset surely needs data cleaning? A FullName B Age C City D Country

  8. Review Question Out of WW, WR, and RW(R) conflicts, which exact set is avoided by the READ COMMITTED isolation level of SQL? A WW B WW, WR C WW, WR, RW D WW, RW

  9. T1 T2 Review Question 3 data objects, 2 concurrent txns, and this schedule: Txn 1 : R(A), W(A), R(B), W(B), Commit Txn 2 : R(B), R(C), W(C), W(B), Commit Is it recoverable? Yes! No txn is reading dirty data anyway. So, the condition for recoverability is vacuously satisfied.

  10. T1 T2 Review Question 3 data objects, 2 concurrent txns, and this schedule: Txn 1 : R(A), W(A), R(B), W(B), Commit Txn 2 : R(B), R(C), W(C), W(B), Commit Is it serializable? No! Not equivalent to either serial order. What conflicts exist? WW conflict. T2 overwrites T1’s write of B based on an older read of B. No WR conflict; no RW(R) conflict.

  11. T1 T2 Review Question 3 data objects, 2 concurrent txns, and this schedule: Txn 1 : R(A), W(A), R(B), W(B), Commit Txn 2 : R(B), R(C), W(C), W(B), Commit Will using READ UNCOMMITTED level make it serializable? Yes! T2 will have to get a long X lock on B; so, T1’s R/W of B will be made to wait till T2 commits; eq. to T2 -> T1

More Related