Intro An important property of query optimization is that we have no way of knowing how many I/Os a plan will cost until we execute that plan. see note 9
example background plan space overview 对physical properties的描述,关注的点在于不同过程对hash or sort的要求 Cost Estimation 讨论的假设 注意predicate之间独立,以及只是关注IO
量化的信息点,catalog selectivity formula side note:
等值join的理解见下图的bunny 🐰 histograms for selectivity estimation 但是这节课用的是等宽的直方图😏,然后所有的条件相互独立,selectivity根据直方图显示的频率计算得出
selectivity for join query $$ \Large R\Join_{p} \sigma_{q}(S) \equiv \sigma_{p}(R \times \sigma_{q}(S)) \equiv \sigma_{p \land q}(R \times S) $$ 所以 $ s = s_p s_q $注意叉积产生的size变化
Relational Algebra Intro Relational algebra 😵
algebra on sets —> mean no duplicates! operational description of transformations Relational Calculus 😵
basic of SQL 以上两者被证明等价并且使得SQL能够被编译–long story……
Relational Algebra Operators Unary Operators Selection (σ) : select rows that satisfy a condition Projection (π) : select specific columns 如果结果是multiset, 去重! 虽然real system不会自动去重 Rename (ρ) : rename a column Union (∪) : tuples in either r1 or r2 two input relations, must be compatible (same number of columns or “fields”, and have same data types corresponding to the same field) UNION in SQL Difference (−) : tuples in r1, but not in r2 same with union, both input relations must be compatible EXCEPT in SQL Cross Product (×) : cartesian product of two relations how many rows in the result?