CS186-L6: Indices & B+ Tree Refinements
Contents
General Notes
issues to consider in any index structure (not just in B+ tree)
- query support: what class of queries can be supported?
- choice of search key
- affects how we write the query
- data entry storage
- affect performance of the index
- variable-length keys tricks
- affect performance of the index
- cost model for Index vs Heap vs Sorted File
query support
Indexes
- basic selection <key><op><constant> 诸如=,BETWEEN,>,<,>=,<=
- more selection
维度灾难😲
但是这节课我们只是关注1-d range search, equality, B+ tree
Search Key and Ordering
注意lexicographic!
以下给出了一个定义Composite Keys
,多列,前等,尾唯一range
注意对
Lexicographic Range
的强调
Data Entry Storage
intro
- the representation of data?
- itself or pointers to it?
- how data is stored?
- clustered or unclustered?
representation
alt. 1
index entry: (key, value)
alt. 2
index entry: (key, recordID), remember recordID is……
alt. 3
index entry: (key, list of recordIDs)
clustered vs unclustered index
clustered is more efficient for IOs 🤔, range search and supports “compression” 🤔
Variable-length keys tricks
- 重新定义 Occupancy Invariant (当不是用整数来index时候)
- get more index entries to shorten the tree (avoiding long-time IOs)
- prefix key compression (only in leaf level 🤔, slightly change the order of keys?)
- suffix key compression
- prefix key compression (only in leaf level 🤔, slightly change the order of keys?)
B+ Tree Costs
这里引入新的假设:
- store by ref (see in alt. 2)
- clustered index with 2/3 full heap file pages
- clustered -> heapfile is initially sorted
- fanout is larger ~
- assume static index
符号表达如下:
- : num of full data blocks (why full? recall previous lecture)
- : num of records per blocks
- : Average time to r/w disk block
- : avg internal node fanout
- : avg num of data entries per leaf
side note:
Scan all records
: 来自与占有率2/3,Equality Search
: !! 来自于从page中读取slot从而获得具体的index并且读取数值, 是搜索pageRange Search
: 应该是Insert
&Delete
: 应该是 , index 1,读取数值 1,改变数值 1,改变index 1
big-O notation: 😸
Summary
time-stamp: 01h42m07s