You are on page 1of 58

Basic Schema Design

Marc Schwering - marc.schwering@10gen.com twitter: @m4rcsch

Monday, 29 October 12

OVERVIEW

Monday, 29 October 12

Relational

Monday, 29 October 12

Terminology
RDBMS Table Row(s) Index Join MongoDB Collection JSON Document Index Embedding & Linking

Monday, 29 October 12

Schema-design criteria
How can we manipulate this data? Access Patterns?

Dynamic Queries Secondary Indexes Atomic Updates Map Reduce Aggregation


Considerations

Read / Write Ratio Types of updates Types of queries Data life-cycle

No Joins Document writes are atomic

Monday, 29 October 12

Rich Document

Monday, 29 October 12

Rich Document
embedding

Monday, 29 October 12

Rich Document

linking
Monday, 29 October 12

INTRODUCTION

Monday, 29 October 12

A simple start
post = {author: "Douglas A.", date: new Date(), title: "Per Anhalter durch die Galaxis", tags: ["42", "scifi"]} > db.blogs.save(post)

Design documents that simply map to your application!

Monday, 29 October 12

Find the document


> db.blogs.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Douglas A.", date: ISODate("2002-01-21T14:01:00.117Z"), text: "Per Anhalter durch die Galaxis", tags: [ "42", "scifi" ] }

Note: _id must be unique, but can be anything you'd like Default BSON ObjectId if one is not supplied

Monday, 29 October 12

Find the document


> db.blogs.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Douglas A.", date: ISODate("2002-01-21T14:01:00.117Z"), text: "Per Anhalter durch die Galaxis", tags: [ "42", "scifi" ] }

Note: _id must be unique, but can be anything you'd like Default BSON ObjectId if one is not supplied

Monday, 29 October 12

Find the document


> db.blogs.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Douglas A.", date: ISODate("2002-01-21T14:01:00.117Z"), text: "Per Anhalter durch die Galaxis", tags: [ "42", "scifi" ] }

Note: _id must be unique, but can be anything you'd like Default BSON ObjectId if one is not supplied

Monday, 29 October 12

Add an index, nd via index


Secondary index on "author" // 1 means ascending, -1 means descending
> db.blogs.ensureIndex({author: 1}) > db.blogs.find({author: 'Douglas A.'}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Douglas A.", date: ISODate("2002-01-21T14:01:00.117Z"), ... }

Monday, 29 October 12

Examine the query plan


> db.blogs.nd({author: "Douglas A."}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Douglas A.", "Douglas A." ] ] } }
Monday, 29 October 12

Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists,$type, ... $lt, $lte, $gt, $gte, $ne, ...

Monday, 29 October 12

Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists,$type, ... $lt, $lte, $gt, $gte, $ne, ...

// nd posts with any tags:


> db.posts.nd( { tags: { $exists: true } } )

Monday, 29 October 12

Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists,$type, ... $lt, $lte, $gt, $gte, $ne, ...

// nd posts with any tags:


> db.posts.nd( { tags: { $exists: true } } )

Regular expressions: // posts where author starts with h


> db.posts.nd( { author : /^d/i } )

Monday, 29 October 12

Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists,$type, ... $lt, $lte, $gt, $gte, $ne, ...

// nd posts with any tags:


> db.posts.nd( { tags: { $exists: true } } )

Regular expressions: // posts where author starts with h


> db.posts.nd( { author : /^d/i } )

Counting: // number of posts written by Douglas A.


> db.posts.nd( { author : "Douglas A." } ).count()
Monday, 29 October 12

Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists,$type, ... $lt, $lte, $gt, $gte, $ne, ...

Update operators:
$set, $inc, $push, ...

Monday, 29 October 12

Extending the Schema


new_comment = {author: "Marvin", date: new Date(), text: "Seht mich an. Ein Gehirn von der Gre eines Planeten!", stars: 1} > db.blogs.update( { author: "Douglas A." }, {"$push": {comments: new_comment}, "$inc": {comments_count: 1} } )

Monday, 29 October 12

Extending the Schema


{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Douglas A.", date: ISODate("2002-01-21T14:01:00.117Z"), text : "Per Anhalter durch die Galaxis", tags : [ "42", "scifi" ], comments : [{ author : "Marvin", date : ISODate("2004-01-23T14:31:53.848Z"), text : "Seht mich an. Ein Gehirn von der Gre eines Planeten!", stars : 1 }], comments_count: 1 }

Monday, 29 October 12

Extending the Schema


{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Douglas A.", date: ISODate("2002-01-21T14:01:00.117Z"), text : "Per Anhalter durch die Galaxis", tags : [ "42", "scifi" ], comments : [{ author : "Marvin", date : ISODate("2004-01-23T14:31:53.848Z"), text : "Seht mich an. Ein Gehirn von der Gre eines Planeten!", stars : 1 }], comments_count: 1 }

Monday, 29 October 12

Extending the Schema


// create index on nested documents: > db.posts.ensureIndex({"comments.author": 1}) > db.posts.nd({"comments.author":"Marvin"})

Monday, 29 October 12

Extending the Schema


// create index on nested documents: > db.posts.ensureIndex({"comments.author": 1}) > db.posts.nd({"comments.author":"Marvin"}) // nd last 5 posts: > db.posts.nd().sort({date:-1}).limit(5)

Monday, 29 October 12

Extending the Schema


// create index on nested documents: > db.posts.ensureIndex({"comments.author": 1}) > db.posts.nd({"comments.author":"Marvin"}) // nd last 5 posts: > db.posts.nd().sort({date:-1}).limit(5) // most commented post: > db.posts.nd().sort({comments_count:-1}).limit(1)

Monday, 29 October 12

RELATIONS

Monday, 29 October 12

One to Many
One to Many relationships can specify degree of association between objects containment life-cycle

Monday, 29 October 12

One to Many - Embedded


Embedded Array / Array Keys
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Douglas A.", date: ISODate("2002-01-21T14:01:00.117Z"), title : "Per Anhalter durch die Galaxis", tags : [ "42", "scifi" ], comments : [{ author : "Marvin", date : ISODate("2004-01-23T14:31:53.848Z"), text : "Seht mich an. Ein Gehirn von der Gre eines Planeten!", stars : 1 }], comments_count: 1 }

Monday, 29 October 12

One to Many - Embedded


Embedded Array / Array Keys
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Douglas A.", date: ISODate("2002-01-21T14:01:00.117Z"), title : "Per Anhalter durch die Galaxis", tags : [ "42", "scifi" ], comments : [{ author : "Marvin", date : ISODate("2004-01-23T14:31:53.848Z"), text : "Seht mich an. Ein Gehirn von der Gre eines Planeten!", stars : 1 }], comments_count: 1 }

Monday, 29 October 12

One to Many - Embedded


Embedded Array / Array Keys
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Douglas A.", date: ISODate("2002-01-21T14:01:00.117Z"), title : "Per Anhalter durch die Galaxis", tags : [ "42", "scifi" ], comments : [{ author : "Marvin", date : ISODate("2004-01-23T14:31:53.848Z"), text : "Seht mich an. Ein Gehirn von der Gre eines Planeten!", stars : 1 }], comments_count: 1 }

Monday, 29 October 12

One to Many - Normalized


//Blogs collection { _id : 1000, author : "Douglas A.", date: ISODate("2002-01-21T14:01:00.117Z"), title : "Per Anhalter durch die Galaxis", } //Comments collection { _id : 1, post : 1000, author : "Marvin", date : ISODate("2004-01-23T14:31:53.848Z"), ... }

> post = db.blogs.find({title: "Per Anhalter durch die Galaxis"}); > db.comments.find({post: post._id});
Monday, 29 October 12

Many to Many
Example:

Post can be in many categories Category can have many blogs

Monday, 29 October 12

Many to Many
//Blogs { _id: 10, title: "Per Anhalter durch die Galaxis", category_ids: [20, 30]} //Categories { _id: 20, name: "Buch", post_ids:[10, 11, 12]} { _id: 30, name: "Satire", post_ids:[10]} //All categories for a given post > db.categories.find({"post_ids": 10})

Monday, 29 October 12

Alternative
//Blogs { _id: 10, title: "Per Anhalter durch die Galaxis", category_ids: [20, 30]} //Categories { _id: 20, name: "Buch"} //All blogs for a given category > db.blogs.find({"category_ids": 20}) //All categories for a given post post = db.blogs.find(_id : some_id) > db.categories.find({_id : {$in : post.category_ids}})

Monday, 29 October 12

One to Many - patterns


Embed when the 'many'
Embedded Array / Array Keys
objects always appear with their parent.

Reference when you

need more exibility.

Embedded Array / Array Keys Normalized

Monday, 29 October 12

INHERITANCE

Monday, 29 October 12

Inheritance

Monday, 29 October 12

Single Table Inheritance - RDBMS


Shapes table
id 1 type circle area 3.14 radius d 1 length width

square 4

rect

10

Monday, 29 October 12

Single Table Inheritance - MongoDB


> db.shapes.find() { _id: "1", type: "circle", area: 3.14, radius: 1} { _id: "2", type: "square", area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}

Monday, 29 October 12

Single Table Inheritance - MongoDB


> db.shapes.find() { _id: "1", type: "circle", area: 3.14, radius: 1} { _id: "2", type: "square", area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})

Monday, 29 October 12

Single Table Inheritance - MongoDB


> db.shapes.find() { _id: "1", type: "circle", area: 3.14, radius: 1} { _id: "2", type: "square", area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}}) // create sparse index > db.shapes.ensureIndex({radius: 1}, {sparse: true})

Monday, 29 October 12

Single Table Inheritance - MongoDB


> db.shapes.find() { _id: "1", type: "circle", area: 3.14, radius: 1} { _id: "2", type: "square", area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}}) // create sparse index > db.shapes.ensureIndex({radius: 1}, {sparse: true})

Monday, 29 October 12

TREES

Monday, 29 October 12

Trees
Hierarchical information

Monday, 29 October 12

Trees
//Embedded Tree { comments : [{ author : "Marvin", text : "...", replies : [{ author : "Zaphod", text : "..." replies : [], }] }] }

+ PROs: Single Document, Performance, Intuitive - CONs: Hard to search, Partial Results, 16MB limit
Monday, 29 October 12

Array of Ancestors
// Store all ancestors of a node { _id: "a" } E { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" } // find all threads where 'b" is in > db.msg_tree.find({"thread": "b"}) // find all direct message "b: replied to > db.msg_tree.find({"replyTo": "b"}) //find all ancestors of f: > threads = db.msg_tree.findOne({"_id": "f"}).thread > db.msg_tree.find({"_id ": { $in : threads})
Monday, 29 October 12

C D F

QUEUES

Monday, 29 October 12

Queue
Requirements

See jobs waiting, jobs in progress Ensure that each job is started once and only once
// Queue document { in_progress: false, priority: 1, message: "Rich documents FTW!" ... }

Monday, 29 October 12

Queue
Requirements

See jobs waiting, jobs in progress Ensure that each job is started once and only once
// Queue document { in_progress: false, priority: 1, message: "Rich documents FTW!" ... } // find highest priority job and mark as in-progress job = db.jobs.findAndModify({ query: {in_progress: false}, sort: {priority: -1), update: {$set: {in_progress: true, started: new Date()}}})
Monday, 29 October 12

Queue
Requirements

See jobs waiting, jobs in progress Ensure that each job is started once and only once
// Queue document { in_progress: false, priority: 1, message: "Rich documents FTW!" ... }

Monday, 29 October 12

Queue
Requirements

See jobs waiting, jobs in progress Ensure that each job is started once and only once
// Queue document { in_progress: false, priority: 1, message: "Rich documents FTW!" ... } // find highest priority job and mark as in-progress job = db.jobs.findAndModify({ query: {in_progress: false}, sort: {priority: -1), update: {$set: {in_progress: true, started: new Date()}}})
Monday, 29 October 12

Queue
Requirements

See jobs waiting, jobs in progress Ensure that each job is started once and only once
// Queue document { in_progress: false, priority: 1, message: "Rich documents FTW!" ... } // find highest priority job and mark as in-progress job = db.jobs.findAndModify({ query: {in_progress: false}, sort: {priority: -1), update: {$set: {in_progress: true, started: new Date()}}})
Monday, 29 October 12

Queue
updated
{ in_progress: true, priority: 1, started: ISODate("2011-09-18T09:56:06.298Z") ... }

added

Monday, 29 October 12

CONCLUSION

Monday, 29 October 12

Watch out for...


Careless indexing Large, deeply nested documents One size ts all collections One collection per user

Monday, 29 October 12

Bottom line
Focus on how your application uses the data Anticipate document and collection growth Take advantage of the MongoDBs exibility and features

Monday, 29 October 12

download at mongodb.org

Were Hiring !
Marc Schwering Email : marc.schwering@10gen.com Twitter : m4rcsch

conferences, appearances, and meetups


http://www.10gen.com/events

Monday, 29 October 12

You might also like