MongoDB Berlin Schema Design

Schema Design
Basic schema modeling in MongoDB

Alvin Richards Technical Director, EMEA alvin@10gen.com @jonnyeight
Topics
Schema design is easy! Data as Objects in code Common patterns Single table inheritance One-to-Many & Many-to-Many Buckets Trees Queues Inventory
So todays example will use...
Terminology
RDBMS Table Row(s) Index Join Partition Partition Key MongoDB Collection JSON Document Index Embedding & Linking Shard Shard Key
Schema Design Relational Database
Schema Design MongoDB

embedding

embedding
linking
Design Session
Design documents that simply map to your application
> post = {author: "Herg", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: ["comic", "adventure"]} > db.posts.save(post)
Find the document

> db.posts.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Herg", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: [ "comic", "adventure" ] } Notes: ID must be unique, but can be anything youd like MongoDB will generate a default ID if one is not supplied
Add and index, nd via Index

Secondary index for author // 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1}) > db.posts.find({author: 'Herg'}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: ISODate("2011-09-18T09:56:06.298Z"), author: "Herg", ... }
Examine the query plan

> db.blogs.find({author: "Herg"}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Herg", "Herg" ] ] } }
Examine the query plan

> db.blogs.find({author: "Herg"}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Herg", "Herg" ] ] } }
Query operators
Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...
// find posts with any tags > db.posts.find({tags: {$exists: true}})
Query operators

Regular expressions:
// posts where author starts with h > db.posts.find({author: /^h/i })
Query operators

Regular expressions:
// posts where author starts with h > db.posts.find({author: /^h/i })

Counting:
// number of posts written by Herg > db.posts.find({author: "Herg"}).count()
Extending the Schema

new_comment = {author: "Kyle", date: new Date(), text: "great book"}
> db.posts.update( {text: "Destination Moon" }, { "$push": {comments: new_comment}, "$inc": {comments_count: 1}})

> db.blogs.find({_id: ObjectId("4c4ba5c0672c685e5e8aabf3")}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Herg", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" } ], comments_count: 1 }

// create index on nested documents: > db.posts.ensureIndex({"comments.author": 1}) > db.posts.find({"comments.author":"Kyle"})

// create index on nested documents: > db.posts.ensureIndex({"comments.author": 1}) > db.posts.find({"comments.author":"Kyle"}) // find last 5 posts: > db.posts.find().sort({date:-1}).limit(5)

// create index on nested documents: > db.posts.ensureIndex({"comments.author": 1}) > db.posts.find({"comments.author":"Kyle"}) // find last 5 posts: > db.posts.find().sort({date:-1}).limit(5) // most commented post: > db.posts.find().sort({comments_count:-1}).limit(1)
When sorting, check if you need an index
Use MongoDB with your language

10gen Supported Drivers Ruby, Python, Perl, PHP, Javascript Java, C/C++, C#, Scala Erlang, Haskell Object Data Mappers Morphia - Java Mongoid, MongoMapper - Ruby MongoEngine - Python Community Drivers F# , Smalltalk, Clojure, Go, Groovy
Using your schema - using Java Driver

// Get a connection to the database DBCollection coll = new Mongo().getDB("blogs"); // Create the Object Map<String, Object> obj = new HashMap... obj.add("author", "Herg"); obj.add("text", "Destination Moon"); obj.add("date", new Date()); // Insert the object into MongoDB coll.insert(new BasicDBObject(obj));
Using your schema - using Morphia mapper

// Use Morphia annotations @Entity class Blog { @Id String author; @Indexed Date date; String text; }
Using your schema - using Morphia

// Create the data store Datastore ds = new Morphia().createDatastore() // Create the Object Blog entry = new Blog("Herg", New Date(), "Destination Moon") // Insert object into MongoDB ds.save(entry);
Common Patterns
Inheritance
Single Table Inheritance RDBMS

shapes table id type
1 area radius length 1 width
circle 3.14
square 4
rect
10
Single Table Inheritance MongoDB

> db.shapes.find()
{ _id: "1", type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, length: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}
missing values not stored!

> db.shapes.find()
// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})

> db.shapes.find()
// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}}) // create index > db.shapes.ensureIndex({radius: 1}, {sparse:true})
index only values present!
One to Many
One to Many relationships can specify degree of association between objects containment life-cycle
One to Many
- Embedded Array - $slice operator to return subset of comments - some queries harder e.g nd latest comments across all blogs
blogs: { author : "Herg", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" } ]}
One to Many
- Normalized (2 collections) - most exible - more queries
blogs: { _id: 1000, author: "Herg", date: ISODate("2011-09-18T09:56:06.298Z"), comments: [ {comment : 1)} ]} comments : { _id : 1, blog: 1000, author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z")} > blog = db.blogs.find({text: "Destination Moon"}); > db.comments.find({blog: blog._id});
Linking versus Embedding

When should I embed? When should I link?
Activity Stream - Embedded

// users - one doc per user with all tweets { _id: "alvin", email: "alvin@10gen.com", tweets: [ { user: "bob", tweet: "20111209-1231", text: "Best Tweet Ever!" } ] }
Activity Stream - Linking

// users - one doc per user { _id: "alvin", email: "alvin@10gen.com" } // tweets - one doc per user per tweet { user: "bob", tweet: "20111209-1231", text: "Best Tweet Ever!" }
Embedding
Great for read performance One seek to load entire object One roundtrip to database Writes can be slow if adding to objects all the time Should you embed tweets?
Activity Stream - Buckets

// tweets : one doc per user per day { _id: "alvin-20111209", email: "alvin@10gen.com", tweets: [ { user: "Bob", tweet: "20111209-1231", text: "Best Tweet Ever!" } , { author: "Joe", date: "May 27 2011", text: "Stuck in traffic (again)" } ] }
Adding a Tweet
tweet = { user: "Bob", tweet: "20111209-1231", text: "Best Tweet Ever!" } db.tweets.update( { _id : "alvin-20111209" }, { $push : { tweets : tweet } );
Deleting a Tweet
db.tweets.update( { _id: "alvin-20111209" }, { $pull: { tweets: { tweet: "20111209-1231" } } )
Many - Many
Example:
- Product can be in many categories - Category can have many products
Many - Many
products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }
Many - Many
products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } categories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] } categories: { _id: 21, name: "movie", product_ids: [ 10 ] }
Many - Many
products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } categories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] } categories: { _id: 21, name: "movie", product_ids: [ 10 ] } //All categories for a given product > db.categories.find({product_ids: 10})
Alternative

categories: { _id: 20, name: "adventure"}
Alternative

categories: { _id: 20, name: "adventure"} // All products for a given category > db.products.find({category_ids: 20)})
Alternative

categories: { _id: 20, name: "adventure"} // All products for a given category > db.products.find({category_ids: 20)}) // All categories for a given product product = db.products.find(_id : some_id) > db.categories.find({_id : {$in : product.category_ids}})
Trees
Hierarchical information
Trees
Full Tree in Document
{ comments: [ { author: Kyle, text: ..., replies: [ {author: James, text: ..., replies: []} ]} ] }
Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 16MB limit

Array of Ancestors
B E
C D F
- Store all Ancestors of a node { _id: "a" } { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" } // find all threads where "b" is in > db.msg_tree.find({thread: "b"})
Array of Ancestors
B E
C D F
- Store all Ancestors of a node { _id: "a" } { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" } // find all threads where "b" is in > db.msg_tree.find({thread: "b"}) // find replies to "e" > db.msg_tree.find({replyTo: "e"})
Array of Ancestors
B E
C D F
- Store all Ancestors of a node { _id: "a" } { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" } // find all threads where "b" is in > db.msg_tree.find({thread: "b"}) // find replies to "e" > db.msg_tree.find({replyTo: "e"}) // find history of "f" > threads = db.msg_tree.findOne( {_id:"f"} ).thread > db.msg_tree.find( { _id: { $in : threads } )
Trees as Paths
Store hierarchy as a path expression - Separate each node by a delimiter, e.g. / - Use text search for nd parts of a tree
{ comments: [ { author: "Kyle", text: "initial post", path: "" }, { author: "Jim", text: "jims comment", path: "jim" }, { author: "Kyle", text: "Kyles reply to Jim", path : "jim/kyle"} ] } // Find the conversations Jim was part of > db.posts.find({path: /^jim/})
Queue
Need to maintain order and state Ensure that updates are atomic
db.jobs.save( { inprogress: false, priority: 1, ... }); // find highest priority job and mark as in-progress job = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -1}, update: {$set: {inprogress: true, started: new Date()}}, new: true})
Queue
Need to maintain order and state Ensure that updates are atomic
db.jobs.save( { inprogress: false, priority: 1, ... }); // find highest priority job and mark as in-progress job = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -1}, update: {$set: {inprogress: true, started: new Date()}}, new: true})
Queue
updated
{ inprogress: true, priority: 1, started: ISODate("2011-09-18T09:56:06.298Z") ... }
added
Inventory
User has a number of "votes" they can use A nite stock that you can "sell" A resource that can be "provisioned"
Inventory
// Number of votes and who user voted for { _id: "alvin", votes: 42, voted_for: [] } // Subtract a vote and add the blog voted for db.user.update( { _id: "alvin", votes : { $gt : 0}, voted_for: {$ne: "Destination Moon" }, { "$push": {voted_for: "Destination Moon"}, "$inc": {votes: -1}})
Summary
Schema design is different in MongoDB Basic data design principals stay the same Focus on how the application manipulates data Rapidly evolve schema to meet your requirements Enjoy your new freedom, use it wisely :-)
download at mongodb.org
alvin@10gen.com
conferences, appearances, and meetups

http://www.10gen.com/events
http://bit.ly/mongo>
Facebook | Twitter | LinkedIn

@mongodb
http://linkd.in/joinmongo

MongoDB Berlin Schema Design

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MongoDB Berlin Schema Design

Uploaded by

Copyright:

Available Formats

Schema Design

Basic schema modeling in MongoDB

So todays example will use...

Schema Design Relational Database

Schema Design MongoDB

Schema Design MongoDB

Schema Design MongoDB

Find the document

Add and index, nd via Index

Examine the query plan

Examine the query plan

// find posts with any tags > db.posts.find({tags: {$exists: true}})

// find posts with any tags > db.posts.find({tags: {$exists: true}})

// posts where author starts with h > db.posts.find({author: /^h/i })

// find posts with any tags > db.posts.find({tags: {$exists: true}})

// posts where author starts with h > db.posts.find({author: /^h/i })

// number of posts written by Herg > db.posts.find({author: "Herg"}).count()

Extending the Schema

Extending the Schema

Extending the Schema

Extending the Schema

Extending the Schema

When sorting, check if you need an index

Use MongoDB with your language

Using your schema - using Java Driver

Using your schema - using Morphia mapper

Using your schema - using Morphia

Single Table Inheritance RDBMS

Single Table Inheritance MongoDB

missing values not stored!

Single Table Inheritance MongoDB

// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})

Single Table Inheritance MongoDB

index only values present!

Linking versus Embedding

Activity Stream - Embedded

Activity Stream - Linking

Activity Stream - Buckets

db.tweets.update( { _id: "alvin-20111209" }, { $pull: { tweets: { tweet: "20111209-1231" } } )

conferences, appearances, and meetups

Facebook | Twitter | LinkedIn

You might also like