/avatar.jpg

CS186-L12: Query Opt: Costs & Search

Intro An important property of query optimization is that we have no way of knowing how many I/Os a plan will cost until we execute that plan. see note 9 example background plan space overview 对physical properties的描述,关注的点在于不同过程对hash or sort的要求 Cost Estimation 讨论的假设 注意predicate之间独立,以及只是关注IO 量化的信息点,catalog selectivity formula side note: 等值join的理解见下图的bunny 🐰 histograms for selectivity estimation 但是这节课用的是等宽的直方图😏,然后所有的条件相互独立,selectivity根据直方图显示的频率计算得出 selectivity for join query $$ \Large R\Join_{p} \sigma_{q}(S) \equiv \sigma_{p}(R \times \sigma_{q}(S)) \equiv \sigma_{p \land q}(R \times S) $$ 所以 $ s = s_p s_q $注意叉积产生的size变化

CS186-L8: Relational Algebra

Relational Algebra Intro Relational algebra 😵 algebra on sets —> mean no duplicates! operational description of transformations Relational Calculus 😵 basic of SQL 以上两者被证明等价并且使得SQL能够被编译–long story…… Relational Algebra Operators Unary Operators Selection (σ) : select rows that satisfy a condition Projection (π) : select specific columns 如果结果是multiset, 去重! 虽然real system不会自动去重 Rename (ρ) : rename a column Union (∪) : tuples in either r1 or r2 two input relations, must be compatible (same number of columns or “fields”, and have same data types corresponding to the same field) UNION in SQL Difference (−) : tuples in r1, but not in r2 same with union, both input relations must be compatible EXCEPT in SQL Cross Product (×) : cartesian product of two relations how many rows in the result?

CS186-L7: Buffer Management

intro 可能需要处理的问题 dirty pages pages replacement state of buffer management Page replacement terminology a page in “use” : Page pin count if full. which should be replaced: page replacement policy request请求发出来,转接到buffer manager…… Page replacement policies LRU (Least Recently Used) 最近最少使用使用的是 时间 pin count == 0! Priority heap data structure can help! like $O(logN)$ CLOCK 一种近似LRU的算法,旨在不需要维护每个页面的访问时间戳,从而减少了额外的开销。 Clock policy将缓冲区中的页面视为一个循环列表,使用一个“时钟指针”来跟踪当前考虑替换的页面。每个页面都有一个“引用位”(ref bit),用于指示该页面是否被最近访问过。 工作流程 初始化:当缓冲区管理器启动时,时钟指针指向第一个未固定(unpinned)的页面,并将该页面的引用位设置为1,当页面被读入时。 页面替换: 当需要替换页面时,缓冲区管理器从时钟指针开始,遍历缓冲区中的页面。 对于每个页面,如果该页面的引用位为1,则将其引用位重置为0,并将时钟指针移动到下一个页面。 如果找到一个引用位为0的页面,则可以将其替换。此时,如果该页面是“脏页”(dirty page),则需要将其写回磁盘,然后读取新的页面并将其引用位设置为1。 LRU-Clock bad behavior Sequential Flooding! MRU (Most Recently Used) General case: SeqScan + MRU $B$ buffers $N>B$ pages in file Improvements for sequential scan: prefetch hybrid?