Contents

CS186-L6: Indices & B+ Tree Refinements

issues to consider in any index structure (not just in B+ tree)

  • query support: what class of queries can be supported?
  • choice of search key
    • affects how we write the query
  • data entry storage
    • affect performance of the index
  • variable-length keys tricks
    • affect performance of the index
  • cost model for Index vs Heap vs Sorted File
  • basic selection <key><op><constant> 诸如=,BETWEEN,>,<,>=,<=
  • more selection /databasel6/image.png 维度灾难😲

但是这节课我们只是关注1-d range search, equality, B+ tree

/databasel6/image-1.png 注意lexicographic!

以下给出了一个定义Composite Keys,多列,前等,尾唯一range /databasel6/image-2.png 注意对Lexicographic Range的强调

Data Entry Storage

  • the representation of data?
    • itself or pointers to it?
  • how data is stored?
    • clustered or unclustered?

index entry: (key, value) /databasel6/image-3.png

index entry: (key, recordID), remember recordID is…… /databasel6/image-4.png

index entry: (key, list of recordIDs) /databasel6/image-5.png

/databasel6/image-6.png clustered is more efficient for IOs 🤔, range search and supports “compression” 🤔 /databasel6/image-7.png

  • 重新定义 Occupancy Invariant (当不是用整数来index时候) /databasel6/image-8.png
  • get more index entries to shorten the tree (avoiding long-time IOs)
    • prefix key compression (only in leaf level 🤔, slightly change the order of keys?)
      • /databasel6/image-9.png
    • suffix key compression
      • /databasel6/image-10.png

这里引入新的假设:

  • store by ref (see in alt. 2)
  • clustered index with 2/3 full heap file pages
    • clustered -> heapfile is initially sorted
    • fanout is larger ~ O(Ref)O(Ref)
    • assume static index

符号表达如下:

  • B B : num of full data blocks (why full? recall previous lecture)
  • R R : num of records per blocks
  • D D : Average time to r/w disk block
  • F F : avg internal node fanout
  • E E : avg num of data entries per leaf

/databasel6/image-11.png side note:

  1. Scan all records: 3/23/2来自与占有率2/3, 23B=BB=32BBD=32BD\frac{2}{3}B’ = B \Rightarrow B’ = \frac{3}{2}B \Rightarrow B’D = \frac{3}{2}B D
  2. Equality Search: 121 \Rightarrow 2 !! 来自于从page中读取slot从而获得具体的index并且读取数值, logF(BR/E)log_F(BR/E) 是搜索page
  3. Range Search: 应该是 (logF(BR/E)+1+3pages)D(log_F(BR/E)+1+3*pages)*D/databasel6/image-14.png
  4. Insert&Delete: 应该是 (logF(BR/E)+4)D(log_F(BR/E)+4)*D, index 1,读取数值 1,改变数值 1,改变index 1

big-O notation: 😸 /databasel6/image-12.png

/databasel6/image-13.png

time-stamp: 01h42m07s