CPS216: Data-Intensive Computing Systems

CPS216: Data-intensive
Computing Systems
Operators for Data Access
(contd.)
Shivnath Babu
Insertion in a B-Tree
n=2
49
15 36
49
Insert: 62
n=2
49
15 36
49
62
Insert: 62
n=2
49
15 36
49
62
Insert: 50
49
15 36
n=2
62
49
50
62
Insert: 50
49
15 36
n=2
62
49
50
62
Insert: 75
49
15 36
n=2
62
49
50
62 75
Insert: 75
Insertion
Insertion
Insertion
10
Insertion
11
Insertion
12
Insertion
13
Insertion
14
Insertion
15
Insertion
16
Insertion
17
Insertion
18
Insertion: Primitives
Inserting into a leaf node

Splitting a leaf node
Splitting an internal node
Splitting root node
19
Inserting into a Leaf

Node
58
54 57 60 62
20

Node
58
54 57
60 62
21

Node
58
54 57 58 60 62
22
Splitting a Leaf Node

61
54 66
54 57 58 60 62
23

61
54 66
54 57 58 60 62
24

61
54 66
54 57 58
60 61 62
25

59
61
54 66
54 57 58
60 61 62
26

61
54 59 66
54 57 58
60 61 62
27
Splitting an Internal Node
21 99
59
40 54 66 74 84
[54, 59) [ 59, 66) [66,74)
21 99
59
40 54 66 74 84
[54, 59) [ 59, 66) [66,74)

66
21 99
[21,66)
40 54 59
[54, 59)
[66, 99)
74 84
[ 59, 66)
[66,74)
Splitting the Root
59
40 54 66 74 84
[54, 59) [ 59, 66) [66,74)
Splitting the Root
59
40 54 66 74 84
[54, 59) [ 59, 66) [66,74)
Splitting the Root

66
40 54 59
[54, 59)
74 84
[ 59, 66)
[66,74)
Deletion
34
Deletion
redistribute
35
Deletion
36
Deletion - II
37
Deletion - II
merge
Deletion - II
39
Deletion - II
40
Deletion - II
41
Deletion - II
Not needed
merge
42
Deletion - II
43
Deletion: Primitives
Delete key from a leaf

Redistribute keys between sibling
leaves
Merge a leaf into its sibling
Redistribute keys between two
sibling internal nodes
Merge an internal node into its
sibling
44
Merge Leaf into Sibling

72
54 58 64
67 85
68 72 75
45

72
54 58 64
67 85
68 75
46

72
67 85
54 58 64 68 75
47

72
85
54 58 64 68 75
48
Merge Internal Node into

Sibling
41 48 52
59
63 74
[52, 59)
[59,63)
49
Merge Internal Node into

Sibling
59
41 48 52 59 63
[52, 59)
[59,63)
50
B-Tree Roadmap
B-Tree
Recap
Insertion (recap)
Deletion
Construction
Efficiency
B-Tree variants
Hash-based Indexes
51
Question
How does insertion-based construction

perform?
52
B-Tree Construction
Sort
48 57 41 15 75 21 62 34 81 11 97 13
53
B-Tree Construction
11 13 15
21 34 41
48 57 62
75 81 97
11 13 15 21 34 41 48 57 62 75 81 97
Scan
B-Tree Construction
21 48 75
11 13 15
21 34 41
Scan
48 57 62
75 81 97
B-Tree Construction
hy is sort-based construction better than

insertion-based one?
56
Cost of B-Tree
Operations
Height of B-Tree: H
Assume no duplicates
Question: what is the random I/O
cost of:
Insertion:
Deletion:
Equality search:
Range Search:
57
Height of B-Tree
Number of keys: N
B-Tree parameter: n
log N
Height log
N =
n
log n
In practice: 2-3 levels
58
Question: How do you pick paramete

1. Ignore inserts and deletes
2. Optimize for equality searches
3. Assume no duplicates
59
Roadmap
B-Tree
B-Tree variants
Sparse Index
Duplicate Keys
Hash-based Indexes
60
Roadmap
B-Tree
B-Tree variants
Hash-based Indexes
Static Hash Table
Extensible Hash Table
Linear Hash Table
61
Hash-Based Indexes
Adaptations of main memory hash

tables
Support equality searches
No range searches
62
Indexing Problem (recap)

Index Keys
record pointer
a1
a2
A = val
ai
an
Main Memory Hash Table

buckets
key
h (key)
0
1
32
48
10
(null)
27
75
3
4
h (key) = key % 8 5
21
6
7
55
(null)
(null)
(null)
(null)
64
Adapting to disk
1 Hash Bucket = 1 Block

All keys that hash to bucket stored in
the block
Intuition: keys in a bucket usually
accessed together
No need for linked lists of keys
65
Adapting to Disk
How do we handle this?

66
Adapting to disk
1 Hash Bucket = 1 Block

All keys that hash to bucket stored in
the block
Intuition: keys in a bucket usually
accessed together
No need for linked lists of keys
but need linked list of blocks
(overflow blocks)
67
Adapting to Disk
68
Adapting to disk
Bucket Id Disk Address mapping

Contiguous blocks
Store mapping in main memory
Too large?
Dynamic Linear and Extensible
hash tables
69
Beware of claims that assume 1 I/O

for hash tables and 3 I/Os for B-Tree!!
70

CPS216: Data-Intensive Computing Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CPS216: Data-Intensive Computing Systems

Uploaded by

Copyright:

Available Formats

CPS216: Data-intensive

Inserting into a leaf node

Inserting into a Leaf

Inserting into a Leaf

Inserting into a Leaf

Splitting a Leaf Node

Splitting a Leaf Node

Splitting a Leaf Node

Splitting a Leaf Node

Splitting a Leaf Node

Splitting an Internal Node

[54, 59) [ 59, 66) [66,74)

Splitting an Internal Node

[54, 59) [ 59, 66) [66,74)

Splitting an Internal Node

Splitting the Root

[54, 59) [ 59, 66) [66,74)

Splitting the Root

[54, 59) [ 59, 66) [66,74)

Splitting the Root

Delete key from a leaf

Merge Leaf into Sibling

Merge Leaf into Sibling

Merge Leaf into Sibling

Merge Leaf into Sibling

Merge Internal Node into

Merge Internal Node into

How does insertion-based construction

hy is sort-based construction better than

Question: How do you pick paramete

Adaptations of main memory hash

Indexing Problem (recap)

Main Memory Hash Table

1 Hash Bucket = 1 Block

How do we handle this?

1 Hash Bucket = 1 Block

Bucket Id Disk Address mapping

Beware of claims that assume 1 I/O

You might also like