CS186-L4: Disks, Buffers, Files II

HHZZ published on 2024-08-11 included in CS186

cost model and analysis 基本量化指标定义基本假设 single record insert and delete equality selection - exactly one match for heap files: insert always appends for sorted files: packed: files compacted after deletions sorted according to search key 以下是计算时间，类似渐进记法，但是有细节 😎 side note: $\times D$ 是简化了每次“操作”的时间，“操作”指的是“读”与“写” 考虑随机变量操作次数 $N$ 及其 $\mathbb{E}(N)$ Equality Search对于sorted files: $$ \begin{equation} \begin{aligned} \mathbb{E}(N) &= \sum_{i=1}^{log_2B} i \times \frac{2^{i-1}}{B} \notag\ &= log_2B - \frac{B-1}{B}\ &\approx log_2B \end{aligned} \end{equation} $$

CS186-L3: Disk, Buffers, Files I

HHZZ published on 2024-08-11 included in CS186

big picture sql client -> DBMS -> database 🤓 DBMS parsing & optimization 执行SQL语句时，DBMS需要解析SQL语句，并将其转换为执行计划。优化器会根据统计信息、查询模式、索引等因素，选择最优的执行计划。 relational operators 处理数据流 or 关系运算符？ files and index management buffer management disk space management 事实上纵向还有两个模块：concurrency control和recovery。省流：从RAM & DISK获取数据非常慢，相对于CPU Disk 注意sector, disk head，其中后者似乎只能单次读写 access time flash SSD 注意： read很快，随着数据变大，可以预测 write很慢，slower for random，写入放大 disk space management block level storage block: unit of transfer for disk read/write (64~128KB in 2018) page: a common synonym for block, in some contexts, it means in RAM

CS186-L2: SQLⅡ

HHZZ published on 2024-07-22 included in CS186

怎么读懂SQL语句？ FROM WHERE, to eliminate rows SELECT GROUP BY HAVING, to eliminate groups DISTINCT ORDER BY, LIMIT, OFFSET等等格式化输出 Join Queries cross product 生成所有的组合，然后过滤掉不符合条件的组合，但是低效考虑以下sql，更加简洁 1 2 3 SELECT S.sid, sname, bid FROM Sailors AS S, Reserves AS R WHERE S.sid = R.sid self join and more aliases 1 2 3 4 SELECT x.sname, x.age, y.sname AS sname2, y.age AS age2 FROM Sailors AS x, Sailors AS y WHERE x.age > y.