cost model and analysis 基本量化指标定义 基本假设
single record insert and delete equality selection - exactly one match for heap files: insert always appends for sorted files: packed: files compacted after deletions sorted according to search key 以下是计算时间,类似渐进记法,但是有细节 😎 side note:
$\times D$ 是简化了每次“操作”的时间,“操作”指的是“读”与“写” 考虑随机变量 操作次数 $N$ 及其 $\mathbb{E}(N)$ Equality Search对于sorted files: $$ \begin{equation} \begin{aligned} \mathbb{E}(N) &= \sum_{i=1}^{log_2B} i \times \frac{2^{i-1}}{B} \notag\ &= log_2B - \frac{B-1}{B}\ &\approx log_2B \end{aligned} \end{equation} $$
big picture sql client -> DBMS -> database 🤓
DBMS parsing & optimization 执行SQL语句时,DBMS需要解析SQL语句,并将其转换为执行计划。优化器会根据统计信息、查询模式、索引等因素,选择最优的执行计划。 relational operators 处理数据流 or 关系运算符? files and index management buffer management disk space management 事实上纵向还有两个模块:concurrency control和recovery。 省流:从RAM & DISK获取数据非常慢, 相对于CPU
Disk 注意sector, disk head, 其中后者似乎只能单次读写 access time flash SSD 注意:
read很快,随着数据变大,可以预测
write很慢,slower for random,写入放大 disk space management block level storage block: unit of transfer for disk read/write (64~128KB in 2018)
page: a common synonym for block, in some contexts, it means in RAM
怎么读懂SQL语句? FROM WHERE, to eliminate rows SELECT GROUP BY HAVING, to eliminate groups DISTINCT ORDER BY, LIMIT, OFFSET等等格式化输出 Join Queries cross product 生成所有的组合,然后过滤掉不符合条件的组合,但是低效
考虑以下sql,更加简洁
1 2 3 SELECT S.sid, sname, bid FROM Sailors AS S, Reserves AS R WHERE S.sid = R.sid self join and more aliases 1 2 3 4 SELECT x.sname, x.age, y.sname AS sname2, y.age AS age2 FROM Sailors AS x, Sailors AS y WHERE x.age > y.