You are on page 1of 36

Schema Chalk Talk

Alvin Richards
alvin@10gen.com

Wednesday, May 25, 2011 1


Topics

Common patterns
• Single table inheritance
• One-to-Many & Many-to-Many
• Trees
• Queues

Wednesday, May 25, 2011 2


So why model data?

http://www.flickr.com/photos/42304632@N00/493639870/

Wednesday, May 25, 2011 3


A brief history of normalization
• 1970 E.F.Codd introduces 1st Normal Form (1NF)
• 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)
• 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)
• 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)

Goals:
• Avoid anomalies when inserting, updating or deleting
• Minimize redesign when extending the schema
• Make the model informative to users
• Avoid bias towards a particular style of query

* source : wikipedia
Wednesday, May 25, 2011 4
Relational made normalized
data look like this

Wednesday, May 25, 2011 5


Document databases make
normalized data look like this

Wednesday, May 25, 2011 6


Terminology

RDBMS MongoDB
Table Collection
Row(s) JSON  Document
Index Index
Join Embedding  &  Linking
Partition Shard
Partition  Key Shard  Key

Wednesday, May 25, 2011 7


DB Considerations
How can we manipulate Access Patterns ?
this data ?

• Dynamic Queries • Read / Write Ratio


• Secondary Indexes • Types of updates
• Atomic Updates • Types of queries
• Map Reduce • Data life-cycle
Considerations
• No Joins
• Document writes are atomic

Wednesday, May 25, 2011 8


Inheritance

Wednesday, May 25, 2011 9


Single Table Inheritance - RDBMS

shapes table
id type area radius d length width

1 circle 3.14 1

2 square 4 2

3 rect 10 5 2

Wednesday, May 25, 2011 10


Single Table Inheritance
>  db.shapes.find()
 {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}
 {  _id:  "2",  type:  "square",area:  4,  d:  2}
 {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}

Wednesday, May 25, 2011 11


Single Table Inheritance
>  db.shapes.find()
 {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}
 {  _id:  "2",  type:  "square",area:  4,  d:  2}
 {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}

//  find  shapes  where  radius  >  0  


>  db.shapes.find({radius:  {$gt:  0}})

Wednesday, May 25, 2011 12


Single Table Inheritance
>  db.shapes.find()
 {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}
 {  _id:  "2",  type:  "square",area:  4,  d:  2}
 {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}

//  find  shapes  where  radius  >  0  


>  db.shapes.find({radius:  {$gt:  0}})

//  create  index
>  db.shapes.ensureIndex({radius:  1})

Wednesday, May 25, 2011 13


Single Table Inheritance

Considerations
• Simple to query across sub-types
• Indexes on specialized values will be small

Wednesday, May 25, 2011 14


One to Many
One to Many relationships can specify
• degree of association between objects
• containment
• life-cycle

Wednesday, May 25, 2011 15


One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries harder
e.g find latest comments across all documents

blogs:  {        
       author  :  "Hergé",
       date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",  
       comments  :  [
     {
    author  :  "Kyle",
    date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",
    text  :  "great  book"
     }
       ]}
Wednesday, May 25, 2011 16
One to Many
- Embedded tree
- Single document
- Natural
- Hard to query
blogs:  {        
       author  :  "Hergé",
       date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",  
       comments  :  [
     {
    author  :  "Kyle",
    date  :  "Sat  Jul  24  2010  20:51:03  GMT-­‐0700  (PDT)",
    text  :  "great  book",
               replies:  [  {  author  :  “James”,  ...}  ]
     }
       ]}

Wednesday, May 25, 2011 17


One to Many
- Normalized (2 collections)
- most flexible
- more queries
blogs:  {        
       author  :  "Hergé",
       date  :  "Sat  Jul  24  2010  19:47:11  GMT-­‐0700  (PDT)",  
       comments  :  [
       {comment  :  ObjectId(“1”)}
       ]}

comments  :  {  _id  :  “1”,


                         author  :  "James",
             date  :  "Sat  Jul  24  2010  20:51:03  ..."}

Wednesday, May 25, 2011 18


One to Many - patterns

- Embedded Array / Array Keys

- Embedded Array / Array Keys


- Embedded tree
- Normalized

Wednesday, May 25, 2011 19


Many - Many
Example:

- Product can be in many categories


- Category can have many products

Wednesday, May 25, 2011 20


Many - Many
products:
     {  _id:  ObjectId("10"),
         name:  "Destination  Moon",
         category_ids:  [  ObjectId("20"),
                                         ObjectId("30”]}
   

Wednesday, May 25, 2011 21


Many - Many
products:
     {  _id:  ObjectId("10"),
         name:  "Destination  Moon",
         category_ids:  [  ObjectId("20"),
                                         ObjectId("30”]}
   
categories:
     {  _id:  ObjectId("20"),  
         name:  "adventure",  
         product_ids:  [  ObjectId("10"),
                                       ObjectId("11"),
                                       ObjectId("12"]}

Wednesday, May 25, 2011 22


Many - Many
products:
     {  _id:  ObjectId("10"),
         name:  "Destination  Moon",
         category_ids:  [  ObjectId("20"),
                                         ObjectId("30”]}
   
categories:
     {  _id:  ObjectId("20"),  
         name:  "adventure",  
         product_ids:  [  ObjectId("10"),
                                       ObjectId("11"),
                                       ObjectId("12"]}

//All  categories  for  a  given  product


>  db.categories.find({product_ids:  ObjectId("10")})

Wednesday, May 25, 2011 23


Alternative
products:
     {  _id:  ObjectId("10"),
         name:  "Destination  Moon",
         category_ids:  [  ObjectId("20"),
                                         ObjectId("30”]}
   
categories:
     {  _id:  ObjectId("20"),  
         name:  "adventure"}

Wednesday, May 25, 2011 24


Alternative
products:
     {  _id:  ObjectId("10"),
         name:  "Destination  Moon",
         category_ids:  [  ObjectId("20"),
                                         ObjectId("30”]}
   
categories:
     {  _id:  ObjectId("20"),  
         name:  "adventure"}

//  All  products  for  a  given  category


>  db.products.find({category_ids:  ObjectId("20")})  

Wednesday, May 25, 2011 25


Alternative
products:
     {  _id:  ObjectId("10"),
         name:  "Destination  Moon",
         category_ids:  [  ObjectId("20"),
                                         ObjectId("30”]}
   
categories:
     {  _id:  ObjectId("20"),  
         name:  "adventure"}

//  All  products  for  a  given  category


>  db.products.find({category_ids:  ObjectId("20")})  

//  All  categories  for  a  given  product


product    =  db.products.find(_id  :  some_id)
>  db.categories.find({_id  :  {$in  :  product.category_ids}})  

Wednesday, May 25, 2011 26


Embedding versus Linking

Embedding
• Simple data structure
• Limited to 16MB
• Larger documents
• How often do you update?
• Will the document grow and grow?
Linking
• More complex data structure
• Unlimited data size
• More, smaller documents
• What are the maintenance needs?
Wednesday, May 25, 2011 27
Trees
Full Tree in Document

{  comments:  [
         {  author:  “Kyle”,  text:  “...”,  
             replies:  [
                                           {author:  “James”,  text:  “...”,
                                             replies:  []}  
             ]}
   ]
}

Pros: Single Document, Performance, Intuitive

Cons: Hard to search, Partial Results, 4MB limit

   
Wednesday, May 25, 2011 28
Trees
Parent Links
- Each node is stored as a document
- Contains the id of the parent

Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)

Wednesday, May 25, 2011 29


Array of Ancestors
- Store all Ancestors of a node
   {  _id:  "a"  }
   {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }
   {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }
   {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }
   {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }
   {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  }

Wednesday, May 25, 2011 30


Array of Ancestors
- Store all Ancestors of a node
   {  _id:  "a"  }
   {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }
   {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }
   {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }
   {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }
   {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  }

//find  all  descendants  of  b:


>  db.tree2.find({ancestors:  ‘b’})

//find  all  direct  descendants  of  b:


>  db.tree2.find({parent:  ‘b’})

Wednesday, May 25, 2011 31


Array of Ancestors
- Store all Ancestors of a node
   {  _id:  "a"  }
   {  _id:  "b",  ancestors:  [  "a"  ],  parent:  "a"  }
   {  _id:  "c",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }
   {  _id:  "d",  ancestors:  [  "a",  "b"  ],  parent:  "b"  }
   {  _id:  "e",  ancestors:  [  "a"  ],  parent:  "a"  }
   {  _id:  "f",  ancestors:  [  "a",  "e"  ],  parent:  "e"  }

//find  all  descendants  of  b:


>  db.tree2.find({ancestors:  ‘b’})

//find  all  direct  descendants  of  b:


>  db.tree2.find({parent:  ‘b’})

//find  all  ancestors  of  f:


>  ancestors  =  db.tree2.findOne({_id:’f’}).ancestors
>  db.tree2.find({_id:  {  $in  :  ancestors})
Wednesday, May 25, 2011 32
Trees as Paths
Store hierarchy as a path expression
- Separate each node by a delimiter, e.g. “/”
- Use text search for find parts of a tree

{  comments:  [
         {  author:  “Kyle”,  text:  “initial  post”,  
             path:  “/”  },
         {  author:  “Jim”,    text:  “jim’s  comment”,
             path:  “/jim”  },
         {  author:  “Kyle”,  text:  “Kyle’s  reply  to  Jim”,
             path  :  “/jim/kyle”}  ]  }

//  Find  the  conversations  Jim  was  part  of  


>  db.posts.find({path:  /^jim/i})

Wednesday, May 25, 2011 33


Queue
• Need to maintain order and state
• Ensure that updates to the queue are atomic
     {  inprogress:  false,
         priority:  1,  
     ...
     }

Wednesday, May 25, 2011 34


Queue
• Need to maintain order and state
• Ensure that updates to the queue are atomic
     {  inprogress:  false,
         priority:  1,  
     ...
     }

//  find  highest  priority  job  and  mark  as  in-­‐progress


job  =  db.jobs.findAndModify({
                             query:    {inprogress:  false},
                             sort:      {priority:  -­‐1),  
                             update:  {$set:  {inprogress:  true,  
                                                             started:  new  Date()}},
                             new:  true})    

Wednesday, May 25, 2011 35


download at mongodb.org

We’re Hiring !
alvin@10gen.com

conferences,  appearances,  and  meetups


http://www.10gen.com/events

Facebook                    |                  Twitter                  |                  LinkedIn


http://bit.ly/mongo>   @mongodb http://linkd.in/joinmongo

Wednesday, May 25, 2011 36

You might also like