CS186-L21: MapReduce and Spark

HHZZ published on 2024-08-14 included in CS186

Motivation only scaling up relational databases is challenging :s MapReduce Data and Programming Model Target Map phase map function will not keep the state of the intermediate results, so it can be parallelized easily Reduce phase for example, wanna count the number of occurrences of each word in the input data, we can use the reduce function to sum up the values of the same key Implementation of MapReduce fault tolerance by writing intermediate results to disk…

CS186-L16: DB Design: FDs and Normalization

HHZZ published on 2024-08-14 included in CS186

Functional Dependencies big picture Def X -> Y means X determines Y, X and Y can be a single column or multiple columns F+ means that to be the set of all FDs that are implied by F terminology Anomalies 可以用FD分解relation从而避免冗余 Armstrongs Axioms Attribute Closure wanna check if X->Y is in F+ BCNF and other Normal Forms Basic Normal Form NF is a def of data model! Boyce-Codd Normal Form Lossless Join Decompositions Def: decomposition won’t create new attributes, and will cover the original attributes (不是完全无重叠分割)

CS186-L17: Recovery

HHZZ published on 2024-08-14 included in CS186

Need for Atomicity and Durability, SQL support for Transactions Strawman Solution No Steal/Force policy seem like no a good choice for recovery not scalable in buffer if crash in 2a, inconsistencies will occur STEAL / NO FORCE, UNDO and REDO STEAL/NO FORCE no force: problem: sys crash before dirty page of a committed transaction is written to disk solution: flush as little as possible, in a convenient space, prior to commit.