CS186-L18: Parallel Query Processing

HHZZ included in CS186

2024-08-14 187 words One minute

Contents

Intro to Parallelism

we will focus on the shared-nothing here 😋

side note:

Round robin means that each machine haves the same shuffled data

scan and merge

$\sigma_p$ : an operator that skip entire sites that have no matching tuples in range or hash partitioning

if data partitioned on function of key, then Route lookup only to the relevant nodes

otherwise, broadcast lookup to all nodes

if on function of key, insert only to the relevant nodes

else insert to any nodes

insert an unique key seems to be same

Pass one is like hashing above, but do it 2x– once for each relation being joined

Pass two is local grace hash join per node

回到均分问题了

然后和上面一样读取分配两次for join

naive group by:

sort and hash can break the pipeline……

one is sorted/hashed

one is small