110 likes | 133 Views
This review discusses issues related to integrating databases, such as information extraction, schema alignment, entity matching, and data cleaning. It also explores different approaches for schema alignment and the need for data cleaning in a specific dataset. Additionally, it covers the conflicts and recoverability conditions in transaction schedules.
E N D
Review Discussion 2 CSE 232AGraduate Database Systems Arun Kumar
Review Question Movies Movies Directors You are tasked with integrating the above two databases. Which of the following issues need not be tackled? A Information extraction B Schema alignment C Entity matching D Data cleaning
Review Question Perform information extraction (IE) on this piece of text to populate the given relation schema without any errors. CSE 232A was taught by Arun Kumar in Fall 2018. It was taught by Yannis Papakonstantinou in Spring that year. Victor Vianu did not teach CSE 232A in Fall 2018 but rather CSE 132A. CourseInfo What type of IE is the above? Closed-world? Closed? Open?
Review Question Which of the following schema alignment approaches offer the most flexibility/ease of use for adding more data sources? A Global-As-View (GAV) B Local-As-View (LAV) C Manual schema alignment D All of the above
Review Question S1 S2 Which of the following is a mediated schema for LAV? A B C D
Review Question Which stage of a typical entity matching workflow helps avoid comparison of all pairs of records? A Blocking B Pairwise check with ML classifier C Clustering D None of the above
Review Question Which attribute in this dataset surely needs data cleaning? A FullName B Age C City D Country
Review Question Out of WW, WR, and RW(R) conflicts, which exact set is avoided by the READ COMMITTED isolation level of SQL? A WW B WW, WR C WW, WR, RW D WW, RW
T1 T2 Review Question 3 data objects, 2 concurrent txns, and this schedule: Txn 1 : R(A), W(A), R(B), W(B), Commit Txn 2 : R(B), R(C), W(C), W(B), Commit Is it recoverable? Yes! No txn is reading dirty data anyway. So, the condition for recoverability is vacuously satisfied.
T1 T2 Review Question 3 data objects, 2 concurrent txns, and this schedule: Txn 1 : R(A), W(A), R(B), W(B), Commit Txn 2 : R(B), R(C), W(C), W(B), Commit Is it serializable? No! Not equivalent to either serial order. What conflicts exist? WW conflict. T2 overwrites T1’s write of B based on an older read of B. No WR conflict; no RW(R) conflict.
T1 T2 Review Question 3 data objects, 2 concurrent txns, and this schedule: Txn 1 : R(A), W(A), R(B), W(B), Commit Txn 2 : R(B), R(C), W(C), W(B), Commit Will using READ UNCOMMITTED level make it serializable? Yes! T2 will have to get a long X lock on B; so, T1’s R/W of B will be made to wait till T2 commits; eq. to T2 -> T1