350 likes | 426 Views
Dive into the world of Relational Algebra, Predicates, and Joins through a doctoral lecture exploring various operations in data processing. Understand Projection, Selection, Renaming, Cartesian Product, Union, Difference, and more, and learn about the significance of algebra in database systems. Discover the efficiencies gained through combining these operations and how Joins elevate data manipulation. Unravel the complexities of relational operations and their applications in information science and engineering.
E N D
Predicates, Joins, and Algebra, Oh my! Matt Dube Doctoral Student, Spatial Information Science and Engineering
Wednesday’s Lecture • Projection (unary) • Horizontal reduction • Selection (unary) • Vertical reduction • Renaming (unary) • Making attributes kosher • Cartesian Product (binary or more) • Most expensive operator, consisting of all possible combinations • Union (binary or more) • Linker • Difference (binary or more) • What’s mine is only mine
Today • Relations as first order language predicates • Efficiencies in combination • Algebra • Join
Projection • Projection takes a tuple and reduces its number of attributes • dog(fido,dalmatian,bob). • Suppose we would like a table that stored only the owner’s name and the dog’s name. • petname(A,B):-dog(B,C,A). • This is a projection.
Selection • Selections act upon particular criteria • dog(fido,dalmatian,bob). • Suppose we want to return anyone who owns a dalmatian and what the name of the dog is • dalmatian(A,B):-dog(A,dalmatian,B). • This is a selection. Note the constant for the variable.
Renaming • Renaming changes the name of an attribute or a relation. • dog(fido,dalmatian,bob). • Suppose I now want to call all dogs hippos. • hippo(A,B,C):-dog(A,B,C). • This is a renaming. It makes more sense with an attribute if you consider attributes spilled out in binary listings.
Cartesian Product • Cartesian products bind two different relations together, maintaining all attributes • dog(fido,dalmatian,bob). • coffee(starbucks,brazil). • Suppose we want something to give us data about all coffee and dogs • javadog(A,B,C,D,E):-coffee(A,B),dog(C,D,E). • This is a Cartesian product. Note that it is an AND statement and needs all variables.
Union • Unions link different relations together in similar strucutres • dog(fido,dalmatian,bob). • cat(mittens,coon,matt). • Suppose we want to know all pets. Dogs and cats are pets. • pet(A,B,C):-dog(A,B,C) ∨cat(A,B,C). • This is a union of the dog relation and the cat relation.
Difference • Differences show what is in one set, but not the other • dog(fido,dalmatian,bob). • cat(fido,dalmatian,bob). • dog(rover,pitbull,joe). • Suppose I want to find dogs who don’t share qualities with a cat. • onlydog(A,B,C):-dog(A,B,C),cat(A,B,C),!. (fail it) • onlydog(A,B,C):-dog(A,B,C). • Only unique dogs are produced here.
Combination of Operations • All of our operators produce relations • Remember, some of the operators need to refine the output to have a relation output (think projection of a non-key) • Since all operators produce relations, this system of operators is a closed system • Closed systems take in inputs of a particular form (in our case relation) and output that form in return. • Since the system is closed, we can string together operators.
Equivalent Combinations • πa1,…,an (σA (R)) = σA ( πa1,…,an (R)) • σA (R – P) = σA (R) – σA (P) • σA∧B (R) = σA (σB (R)) = σB (σA (R)) • σA∨B (R) = σA (R) ∪ σB (R) • σA (R x P) = σB∧C∧D (R x P) = σD (σB(R) xσC(P))
Optimization • πa1,…,an (σA (R)) = σA ( πa1,…,an (R)) • σA (R – P) = σA (R) – σA (P) • σA∧B (R) = σA (σB (R)) = σB (σA (R)) • σA∨B (R) = σA (R) ∪ σB (R) • σA (R x P) = σB∧C∧D (R x P) = σD (σB(R) xσC(P)) Project, then select. Difference, then select. Union, then select. Break apart complex selections
Algebra • From the arabicAl-jabr, meaning “reunion” • Algebras are mathematical structures formed off of operators • You are familiar with an algebra from your studies in mathematics (+, -, *, /), which can be boiled down to simply + and * • Algebras are closed systems under their operators and also have identities for every operator (+ 0, - 0, * 1, / 1)
Relational Algebra • What is the identity form for each operator? • Projection • Project all of the attributes • Selection • Select the entire key structure • Renaming • Rename the attribute to the same name • Cartesian Product • Cross with an empty relation • Union • Union with a subset of the original relation • Difference • Difference with a mutually exclusive set
Why is an Algebra important? • Algebras (reunions) are what allows for operations to be strung together • An example of an important consequence: operational efficiency • You have seen the relational algebra equivalencies, but there is an easier example that you already know.
Here’s the test: • What are the key properties of an algebra? • Closed under operations • Every operation has an identity state • Why is an algebra important for a system? • Stringing together operators • Efficient processes (think the distributive property) • Operational equivalence
Joins • Joins are what we would call a higher level operation • Higher level operation? • Think of an exponent (successive multiplications of terms) • Why can we use higher level operations? (You already know the answer…) • Most important operator for some key reasons • Done by imposing a condition on a pair of attributes
Types of Joins • Inner Joins • Equi-Joins • Natural Joins • Outer Joins
Inner Join • Selection on a Cartesian Product • Selection involves some sort of criteria between a member of one relation and a member of the other (similar to the keys for the Cartesian product)
Inner Join A B σA.a2>B.b1 (A x B)
Inner Join A x B σA.a2>B.b1 (A x B)
Inner Join A x B σA.a2>B.b1 (A x B)
Inner Join A x B σA.a2>B.b1 (A x B)
Inner Join σA.a2>B.b1 (A x B)
Equi-Join • Another selection on a Cartesian product • As the name implies, this has to do with two of the attributes being equal
Equi-Join A B σA.a2=B.b1 (A x B)
Equi-Join A x B σA.a2=B.b1 (A x B)
Equi-Join σA.a2=B.b1 (A x B)
Natural Join • σ< A.attribute_1 = B.attribute_2 > (A x B) • attribute_1 and attribute_2 must have same name • only one column with this attribute shows up in the result (i.e., a projection follows) • Selection on a Cartesian product (Equi-Join) followed by a projection
Natural Join A B σA.yy=B.yy (A x B)
Natural Join A x B σA.a2=B.b1 (A x B)
Natural Join π<xx,yy,zz> (σA.a2=B.b1 (A x B))
Natural Join π<xx,yy,zz> (σA.a2=B.b1 (A x B))
Outer Joins • An outer join is a join which allows for NULL values to be involved • Three types • Left Outer Join • Everything in the left relation will be present, regardless if found in B or not • Right Outer Join • Everything in the right relation will be present, regardless if found in A or not • Full Outer Join • Both left and right outer joins
Summary • Converted operators into a first order language parallel • Showed how to combine operators for efficiency • Defined the concept of algebra • Showed the different types of joins