You are on page 1of 70

CPS216: Data-intensive

Computing Systems
Operators for Data Access
(contd.)
Shivnath Babu

Insertion in a B-Tree
n=2

49

15 36

49

Insert: 62

Insertion in a B-Tree
n=2

49

15 36

49

62

Insert: 62

Insertion in a B-Tree
n=2

49

15 36

49

62

Insert: 50

Insertion in a B-Tree
49

15 36

n=2

62

49

50

62

Insert: 50

Insertion in a B-Tree
49

15 36

n=2

62

49

50

62

Insert: 75

Insertion in a B-Tree
49

15 36

n=2

62

49

50

62 75

Insert: 75

Insertion

Insertion

Insertion

10

Insertion

11

Insertion

12

Insertion

13

Insertion

14

Insertion

15

Insertion

16

Insertion

17

Insertion

18

Insertion: Primitives

Inserting into a leaf node


Splitting a leaf node
Splitting an internal node
Splitting root node

19

Inserting into a Leaf


Node

58
54 57 60 62

20

Inserting into a Leaf


Node

58
54 57

60 62

21

Inserting into a Leaf


Node

58
54 57 58 60 62

22

Splitting a Leaf Node


61

54 66

54 57 58 60 62

23

Splitting a Leaf Node


61

54 66

54 57 58 60 62

24

Splitting a Leaf Node


61

54 66

54 57 58

60 61 62

25

Splitting a Leaf Node


59
61

54 66

54 57 58

60 61 62

26

Splitting a Leaf Node


61

54 59 66

54 57 58

60 61 62

27

Splitting an Internal Node

21 99

59
40 54 66 74 84

[54, 59) [ 59, 66) [66,74)

Splitting an Internal Node

21 99

59
40 54 66 74 84

[54, 59) [ 59, 66) [66,74)

Splitting an Internal Node


66

21 99

[21,66)
40 54 59

[54, 59)

[66, 99)
74 84

[ 59, 66)

[66,74)

Splitting the Root

59
40 54 66 74 84

[54, 59) [ 59, 66) [66,74)

Splitting the Root

59
40 54 66 74 84

[54, 59) [ 59, 66) [66,74)

Splitting the Root


66

40 54 59

[54, 59)

74 84

[ 59, 66)

[66,74)

Deletion

34

Deletion

redistribute

35

Deletion

36

Deletion - II

37

Deletion - II

merge

Deletion - II

39

Deletion - II

40

Deletion - II

41

Deletion - II
Not needed

merge

42

Deletion - II

43

Deletion: Primitives

Delete key from a leaf


Redistribute keys between sibling
leaves
Merge a leaf into its sibling
Redistribute keys between two
sibling internal nodes
Merge an internal node into its
sibling
44

Merge Leaf into Sibling


72

54 58 64

67 85

68 72 75

45

Merge Leaf into Sibling


72

54 58 64

67 85

68 75

46

Merge Leaf into Sibling


72

67 85

54 58 64 68 75

47

Merge Leaf into Sibling


72

85

54 58 64 68 75

48

Merge Internal Node into


Sibling

41 48 52

59

63 74

[52, 59)

[59,63)
49

Merge Internal Node into


Sibling

59

41 48 52 59 63

[52, 59)

[59,63)
50

B-Tree Roadmap

B-Tree
Recap
Insertion (recap)
Deletion
Construction
Efficiency

B-Tree variants
Hash-based Indexes
51

Question

How does insertion-based construction


perform?

52

B-Tree Construction

Sort

48 57 41 15 75 21 62 34 81 11 97 13
53

B-Tree Construction

11 13 15

21 34 41

48 57 62

75 81 97

11 13 15 21 34 41 48 57 62 75 81 97
Scan

B-Tree Construction
21 48 75

11 13 15

21 34 41

Scan

48 57 62

75 81 97

B-Tree Construction

hy is sort-based construction better than


insertion-based one?

56

Cost of B-Tree
Operations

Height of B-Tree: H
Assume no duplicates
Question: what is the random I/O
cost of:
Insertion:
Deletion:
Equality search:
Range Search:

57

Height of B-Tree

Number of keys: N
B-Tree parameter: n
log N
Height log
N =
n
log n
In practice: 2-3 levels

58

Question: How do you pick paramete


1. Ignore inserts and deletes
2. Optimize for equality searches
3. Assume no duplicates

59

Roadmap

B-Tree
B-Tree variants
Sparse Index
Duplicate Keys

Hash-based Indexes

60

Roadmap

B-Tree
B-Tree variants
Hash-based Indexes
Static Hash Table
Extensible Hash Table
Linear Hash Table

61

Hash-Based Indexes

Adaptations of main memory hash


tables
Support equality searches
No range searches

62

Indexing Problem (recap)


Index Keys

record pointer

a1
a2
A = val

ai
an

Main Memory Hash Table


buckets

key

h (key)

0
1

32

48

10

(null)

27

75

3
4

h (key) = key % 8 5

21

6
7

55

(null)

(null)

(null)
(null)
64

Adapting to disk

1 Hash Bucket = 1 Block


All keys that hash to bucket stored in
the block
Intuition: keys in a bucket usually
accessed together
No need for linked lists of keys

65

Adapting to Disk

How do we handle this?


66

Adapting to disk

1 Hash Bucket = 1 Block


All keys that hash to bucket stored in
the block
Intuition: keys in a bucket usually
accessed together
No need for linked lists of keys
but need linked list of blocks
(overflow blocks)

67

Adapting to Disk

68

Adapting to disk

Bucket Id Disk Address mapping


Contiguous blocks
Store mapping in main memory
Too large?
Dynamic Linear and Extensible
hash tables

69

Beware of claims that assume 1 I/O


for hash tables and 3 I/Os for B-Tree!!

70

You might also like