/avatar.jpg

CS186-L4: Disks, Buffers, Files II

cost model and analysis 基本量化指标定义 基本假设 single record insert and delete equality selection - exactly one match for heap files: insert always appends for sorted files: packed: files compacted after deletions sorted according to search key 以下是计算时间,类似渐进记法,但是有细节 😎 side note: $\times D$ 是简化了每次“操作”的时间,“操作”指的是“读”与“写” 考虑随机变量 操作次数 $N$ 及其 $\mathbb{E}(N)$ Equality Search对于sorted files: $$ \begin{equation} \begin{aligned} \mathbb{E}(N) &= \sum_{i=1}^{log_2B} i \times \frac{2^{i-1}}{B} \notag\ &= log_2B - \frac{B-1}{B}\ &\approx log_2B \end{aligned} \end{equation} $$

CS186-L3: Disk, Buffers, Files I

big picture sql client -> DBMS -> database 🤓 DBMS parsing & optimization 执行SQL语句时,DBMS需要解析SQL语句,并将其转换为执行计划。优化器会根据统计信息、查询模式、索引等因素,选择最优的执行计划。 relational operators 处理数据流 or 关系运算符? files and index management buffer management disk space management 事实上纵向还有两个模块:concurrency control和recovery。 省流:从RAM & DISK获取数据非常慢, 相对于CPU Disk 注意sector, disk head, 其中后者似乎只能单次读写 access time flash SSD 注意: read很快,随着数据变大,可以预测 write很慢,slower for random,写入放大 disk space management block level storage block: unit of transfer for disk read/write (64~128KB in 2018) page: a common synonym for block, in some contexts, it means in RAM

CS186-L2: SQLⅡ

怎么读懂SQL语句? FROM WHERE, to eliminate rows SELECT GROUP BY HAVING, to eliminate groups DISTINCT ORDER BY, LIMIT, OFFSET等等格式化输出 Join Queries cross product 生成所有的组合,然后过滤掉不符合条件的组合,但是低效 考虑以下sql,更加简洁 1 2 3 SELECT S.sid, sname, bid FROM Sailors AS S, Reserves AS R WHERE S.sid = R.sid self join and more aliases 1 2 3 4 SELECT x.sname, x.age, y.sname AS sname2, y.age AS age2 FROM Sailors AS x, Sailors AS y WHERE x.age > y.