You are on page 1of 71

Databases in computer

World
DBMS / RDBMS / NO SQL

By Himanshu Patel
1/23/17

Databases in computer World

DBMS vs RDBMS
No
.

DBMS

RDBMS

DBMS applications store data as file.

In DBMS, data is generally stored in either In RDBMS, the tables have an identifier called primary
a hierarchical form or a navigational form. key and the data values are stored in the form of tables.
Normalization is not present in DBMS. Normalization is present in RDBMS.

3
4

7
8

RDBMS applications store data in a tabular form.

DBMS does not apply any security with RDBMS defines the integrity constraint for the
regards to data manipulation.
purpose of ACID (Atomocity, Consistency, Isolation and
Durability) property.
DBMS uses file system to store data, so in RDBMS, data values are stored in the form of tables,
there will be no relation between the so a relationshipbetween these data values will be
tables.
stored in the form of a table as well.
DBMS has to provide some uniform RDBMS system supports a tabular structure of the data
methods to access the stored information. and a relationship between them to access the stored
information.
DBMS does not support distributed RDBMS supports distributed database.
database.
DBMS is meant to be for small RDBMS is designed to handle large amount of data. it
organization and deal with small data. it supports multiple users.
supports single user.
Examples
of
DBMS
are
file Example
of
RDBMS
are mysql, postgre, sql
systems, xml etc.
server, oracle etc.

1/23/17

Databases in computer World

SQL vs NoSQL
No
.

SQL (Since 1970)

NoSQL (Since 2000)

Relational Database

Non Relational Database

Store data in table Structure

Store data in Document store, Key value pairs, Wide


column store or graphs

Vertical Scalable

Horizontal scalable

Adding a new property


altering schema changes

Good for structure data

Good for semi structure data

Strict Schema

Flexible schema

Support ACID transactions

Support BASE transaction

Strong consistency supported

Consistency varies per solutions, some solution have


tunable consistency

Storage generally on one server

Storage on distributed servers

1/23/17

may

require New property can be Added on the fly

Databases in computer World

SQL vs NoSQL

1/23/17

Databases in computer World

ACID vs BASE
No
.

ACID (relational)

BASE (NoSQL)

Strong consistency

Isolation

Transaction

Program managed

Robust database

Simple database

Simple code (SQL)

Complex code

Available and consistent

Available and partition-tolerant

Scale-up (limited)

Scale-out (unlimited)

Shared (disk, mem, proc etc.)

Nothing shared (parallellizable)

Storage generally on one server

Storage on distributed servers

1/23/17

Weak consistency
Last write wins

Databases in computer World

Why NoSQL is better

It supports semi-structured data and volatile data


It does not have schema
Read/Write throughput is very high
Horizontal scalability can be achieved easily
Will support Bigdata in volumes of Terra Bytes & Peta Bytes
Provides good support for Analytic tools on top of Bigdata
Can be hosted in cheaper hardware machines
In-memory caching option is available to increase the performance of queries
Faster development life cycles for developers

1/23/17

Databases in computer World

CAP Theorem

1/23/17

Databases in computer World

Database System

Number of systems per category, December 2016


1/23/17

Databases in computer World

Database Popularity

This chart shows the popularity of each category.


1/23/17

Databases in computer World

Database Model
Key value- Column
stores
family

RDF
Stores

Multimod Document Hierarchic


el
al
Databases

Amazon
DynamoDB

Apache Jena

arangodb

Mongo DB

Datomic

Couch DB

Big table

Redis

Hbase

Riak

hyper table

Orient DB

Rethink DB

Voldemort

Cassandra

FatDB

Raven DB

AlchemyDB

terrastore

FoundationDB

Apache
Accumulo

Sesame

BangDB

Jas DB

KAI

Raptor DB

hamsterdb

djon DB

Tarantool

EJDB

Maxtable

denso DB

HyperDex

Couchbase

1/23/17

Databases in computer World

InterSystems
Cach
GT.M

10

Comparison of Databases
MSSQL

MongoDB, Inc

Cassandra

Data Storage Model

Relational DBMS

Document-oriented

Wide column store

JOINs

Yes

No

No

Transaction

ACID

No

No

Data schema

Fixed

Dynamic

Flexible

Scalability

Vertical

Horizontal

Horizontal

Replication

Yes

Query Language

Yes (depending on software Primary-Secondary


edition)
SQL query language
JSON query language

MapReduce

No

Yes

Yes

Triggers

Yes

No

Yes

Foreign keys

Yes

No

No

Concurrency

Yes

Yes

Yes

Company

Microsoft

MongoDB, Inc

Licence

Commercial

Open Source

Apache Software
Foundation
Open Source

Implementation language

C++

C++

Java

OS support

Windows

Drivers for programming


languages

1/23/17

CQL

Windows, Linux, OS X,
BSD ,Linux, OS X, Windows
Solaris
.NET, Java, PHP, Python,
Actionscript, C, C#, C++,
C#, C++, Clojure, Erlang,
Ruby, Visual Basic
Clojure, ColdFusion, D, Dart, Go, Haskell, Java, JavaScript
Delphi, Erlang, Go, Groovy, , Perl, PHP, Python, Ruby,
Haskell, Java, JavaScript,
Scala
Lisp, Lua, MatLab, Perl, PHP,
Databases in computer
World
11
PowerShell,
Prolog, Python,
R, Ruby, Scala, Smalltalk

Key-Value Data Model


Capabilities
The simplest model where each
object is retrieved with a unique
key, with values having no
inherent model

Applications
Applications requiring fast
access to a large number of
objects, such as caches or
queues

Limitations
Cannot update subset of a value

Utilize in-memory storage to


Applications that require fastDoes not provide querying
provide fast access with optional changing data environments like
persistence
mobile, gaming, online ads

Other data models built on top


of this model to provide more
complex objects

As number of objects becomes


large, generating unique keys
could become complex

Databases : BerkleyDB,MemcacheDB,Redis,DynamoDB
1/23/17

Databases in computer World

12

Document-oriented Data Model


Capabilities
Extension of key-value model,
where value is a structured
document

Applications
Applications that need to
manage a large variety of
objects that differ in structure

Limitations
No standard query syntax

Documents can be highly


Large product catalogs in ecomplex, hierarchical data
commerce, customer profiles,
structures without requiring pre- content management
defined schema
applications

Query performance not linearly


scalable

Supports queries on structured


documents

Join queries across


collections not efficient

Search platforms are also


document-oriented

Databases : MongoDB,CouchDB,Apache Solr,Elastic


Search
1/23/17

Databases in computer World

13

Column-Oriented Data Model


Capabilities

Applications

Limitations

Extension of key-value
model, where the value is a set
of columns (column-family)

Storing a large number of timestamped data like event logs,


sensor data

No join queries or sub-queries

A column can have multiple


time-stamped versions

Analytics that involve querying


entire columns of data such as
trends or time series analytics

Limited support for aggregation

Columns can be generated at


run-time and not all rows need
to have all columns

Ordering is done per partition,


specified at table creation time

Databases : Cassandra,BigTable,HBase,Apache
Accumulo
1/23/17

Databases in computer World

14

Graph-oriented Data Model


Capabilities

Applications

Limitations

Models graphs consisting


of nodes and edges with
properties (meta-data)
describing them

Applications that deal with


objects with a large number of
inter-relations

Difficult to scale for large data


sets for generic graphs

Implement very fast graph


traversal operations

Applications like social


networking friends-networks,
hierarchical role based
permissions, complex decision
trees, maps, network topologies

Giraph uses the Bulk


Synchronous Parallelmodel to
overcome some of the scalability
limitations

Also support indexing of meta


data to enable graph traversal
combined with search queries

Databases : Neo4J,OrientDB,Apache
Giraph,AllegroGraph
1/23/17

Databases in computer World

15

Classification and comparisonof NoSQL


databases
Performa Scalability Flexibility Complexit Functional
nce
y
ity
Key-Value Stores high

high

high

None

variable (none)

Column Stores

high

moderate

low

minimal

Document Stores high

variable (high)

high

low

variable (low)

Graph Databases variable

variable

high

high

graph theory

Relational
Databases

variable

low

moderate

relational algebra

1/23/17

high

variable

Databases in computer World

16

Database ranking
Ra DBMS
nk
1

Oracle

MySQL

3
4

Microsoft SQL
Server
MongoDB

PostgreSQL

DB2

Cassandra

Redis

Database Score
Model
Relational
DBMS
Relational
DBMS
Relational
DBMS
Document store
Relational
DBMS
Relational
DBMS
Wide column
store
Key-value store

23 Memcached

Key-value store

29.09

1362.65

24 Amazon
DynamoDB
27 CouchDB

Document store

28.98

Document store

22.18

32 Riak KV

Key-value store

10.88

33 MarkLogic

Multi-model

10.3

38 Hazelcast

Key-value store

7.53

39 Sphinx

Search engine

41 Ehcache

Key-value store

6.44

42 OrientDB

Multi-model

6.25

45 InfluxDB

5.32

46 RethinkDB

Time Series
DBMS
Document store

47 Titan

Graph DBMS

5.12

55 Adabas

Multivalue
DBMS
Content store

3.89

1214.18
318.8
318.69
180.56
135.06
109.54

Search engine

99.12

14 Solr

Search engine

66.57

15 HBase

58.19

Splunk

Wide column
store
Search engine

Neo4j

Graph DBMS

36.45

21

http://db-engines.com/en/ranking
Couchbase
Document store

22
1/23/17

Database Score
Model

1417.1

11 Elasticsearch

17

Ra DBMS
nk

(Dec 2016)

53

57 Jackrabbit
29.3
Databases in computer World

7.3

5.23

3.58
17

Database ranking
Ra DBMS
nk
58 Accumulo

Database Score
Model

Ra DBMS
nk

Wide column
store
Search engine

3.43 104 Sedna

Multi-model

2.69 112 Tamino

72 Apache Drill

Multi-model

2.53 113 jBASE

73 RRDtool

Time Series
DBMS
Multi-model

2.48

Object oriented
DBMS
Time Series
DBMS
RDF store

2.08

Object oriented
DBMS
RDF store

1.59

67 Google Search
Appliance
68 Virtuoso

79 ArangoDB
80 Cach
85 Graphite
86 Jena
94 Db4o
96 RDF4J
98 OpenTSDB

2.73 111 D3

2.15

1.9

116 ObjectStore
117 Giraph
119 BaseX
122 Matisse

1.81 125 Model 204


127 Northgate
Reality
1.49
130 Algebraix
1.47
138 IDMS

Time Series
DBMS
99 IMS
Navigational
1.47
140 Druid
DBMS
http://db-engines.com/en/ranking
103
Versant Object Object oriented
1.3
145 Hypertable
Database
DBMS
1/23/17
Databases in computer World

(Dec 2016)

Database Score
Model
Native XML
DBMS
Multivalue
DBMS
Native XML
DBMS
Multivalue
DBMS
Object oriented
DBMS
Graph DBMS

1.22

Native XML
DBMS
Object oriented
DBMS
Multivalue
DBMS
Multivalue
DBMS
RDF store

0.92

Navigational
DBMS
Time Series
DBMS
Wide column
store

0.62

1.03
1.03
1.02
0.98
0.95

0.8
0.76
0.74
0.7

0.6
0.56
18

Database ranking

(Dec 2016)

Ra DBMS
nk

Database Score
Model

172 Google Cloud


Bigtable
186 Event Store

Wide column
store
Event Store

0.34

192 4store

RDF store

0.22

197 eXist-db
199 Redland

Native XML
DBMS
RDF store

201 InfiniteGraph

Graph DBMS

0.19

203 ModeShape

Content store

0.18

214 NEventStore

Event Store

0.15

221 Dgraph

Graph DBMS

0.13

0.26

0.2
0.2

http://db-engines.com/en/ranking

1/23/17

Databases in computer World

19

The History of Cassandra

Original author(s)
Avinash
Lakshman,
Prashant Malik
Developer(s) Apache Software
Foundation
Initial release 2008
Stable release
3.9 / Sep 29,
2016
Written in Java
Operating system
Crossplatform
Website
cassandra.apache.org

1/23/17

Databases in computer World

20

Cassandra compare to HBase


In comparison to HBase, Cassandra supplies:
Higher performance
True continuous, "always on availability with no single point of
failure
Powerful and easy multidata center / cloud availability zone
support
A simpler architecture (masterless) with easier setup and fewer
requirements
Easier development (SQLlike language with CQL, more)

1/23/17

Databases in computer World

21

Cassandra Architecture
Cassandra is a distributed, decentralized, fault
tolerant, eventually consistent, linearly
scalable, and columnoriented data store.
Virtual nodes

1/23/17

Databases in computer World

22

HOW Cassandra WORKS

There are four data buckets that you need to know. MemTable is a
hash table-like structure that stays in memory. It contains actual
cell data. SSTable is the disk version of MemTables. When
MemTables are full, they are persisted to hard disk as SSTable.
Commit log is an append only log of all the mutations that are sent
to the Cassandra cluster
Commit log lives on the disk and helps to replay uncommitted
changes. These three are basically core data. Then there are
bloom filters and index. The bloom filter is a probabilistic data
structure that lives in the memory. They both live in memory and
contain information about the location of data in the SSTable. Each
SSTable has one bloom filter and one index associated with it. The
bloom filter helps Cassandra to quickly detect which SSTable does
not have the requested data, while the index helps to find the
1/23/17
Databases in computer World
23
exact location of the data
in the SSTable file.

Definition of Cassandra

Apache Cassandra is a
Distributed...
High performance...
Extremely scalable...
Fault tolerant (ie. no single point of failure)...
Post-relational database solution. Cassandra can serve as both real-time
datastore (the system of record") for online/transactional applications.
and as a read-intensive database for business intelligence systems.

1/23/17

Databases in computer World

24

Architecture Overview
Cassandra was designed with the
Understanding that
system/hardware failures can and do occur
Peer-to-peer, distributed system
All nodes the some
Data partitioned among all nodes in the
cluster
Custom data replication to ensure fault
tolerance
Read/Write-anywhere design

1/23/17

Databases in computer World

25

Architecture Overview
The schema used in Cassandra is mirrored offer Google
Bigtable. It is o row oriented, column structure
A keyspace is akin to a database in the RDBMS world
A column family is similar to on RDBMS table but is more
flexible/dynamic
A row in a column family is indexed by its key. Other
columns may be indexed as well

1/23/17

Databases in computer World

26

No Single Point of Failure


All nodes the same
Customized replication affords tunable data
redundancy
Read/write from any node
Can replicate data among different physical
data centre racks

1/23/17

Databases in computer World

27

Big Data Scalability


Capable of comfortably scaling to petabytes
New nodes = Linear performance increases
Add new nodes online

1/23/17

Databases in computer World

28

Easy Replication / Data Distribution

Transparently handled by Cassandra


Multi-data centre capable
Exploits all the benefits of Cloud computing
Able to do hybrid Cloud/On-premise setup

1/23/17

Databases in computer World

29

No Need for Caching Software


Peerto-peer architecture removes need for special caching
layer and the programming that goes with it
The database cluster uses the memory from all participating
nodes to cache the data assigned to each node
No irregularities between a memory cache and database
are encountered

1/23/17

Databases in computer World

30

Tunable Data Consistency


Choose between strong and eventual consistency
(All to any node responding) depending on the need
Can be done an a per-operation basis, and for
both reads and writes
Handles Multi-data center operations

1/23/17

Databases in computer World

31

Flexible Schema
Dynamic schema design allows for much more flexible data
storage than rigid RDBMS
Handles structured, semistructured, and unstructured data.
Counters also supported
No offline/downtime for schema changes
Supports primary and secondary indexes

1/23/17

Databases in computer World

32

Data Compression
Uses Google's Snappy data compression algorithm
Compresses data on a per column family level
Internal tests al Datastax show up to 80%+
compression of raw data
No performance penalty (and some increases in
overall performance due to less physical I/O)!

1/23/17

Databases in computer World

33

CQL Language
Very similar to RDBMS SQL syntax
Create objects via DDL [e.g. CREATE...)
Core DML commands supported: INSERT, UPDATE,
DELETE
Query data with SELECT

1/23/17

Databases in computer World

34

Whos Using Cassandra

1/23/17

Databases in computer World

35

What is AMAZON S3

S3 stands for Simple Storage Service.


It is storage for Internet.
Provided via web service interface (REST and SOAP)
Base on Same infrastructure Amazon uses for its
global network of website.

1/23/17

Databases in computer World

36

Functions & concepts of S3


Allows unlimited storage of objects(files) containing
of 1 byte to 5 gigabytes each.
Objects consist of the raw object data and metadata
Objects are stored and retrieved using a developerassigned key.
Data are kept secured from unauthorised access
through authentication mechanism.
Objects can be made available to public by the http
or bittorrent protocol.

1/23/17

Databases in computer World

37

Functions & concepts of s3


All objects are stored in buckets.
A bucket is simply a container for objects. It is
used to partition the namespace of objects at the
highest level.
Buckets are similar to Internet domain names.
They are accessed via
bucketname.s3.amazonaws.com.
Each developer account has a limit of 100 buckets.
More information of buckets can be found at:
http://docsamazonwebservicescom/AmazonS3/2
O06-0301/index.htm?UsingBucke1.htm
1/23/17

Databases in computer World

38

Functions & concepts of S3


A key is the unique identifier for an object within
a bucket.
A bucket and a key together uniquely identify
each object in S3.Every object can be addressed
through bucket and key combination.
For example, if your bucket name is mybucket
and key is myhomepagehtml, the URL for the
object will be
http://mybucket.s3.an1azonaws.com/myhome
page.html

1/23/17

Databases in computer World

39

Functions & concepts of S3


Scalability. The amount of storage & bandwidth you
need can scale as you like without any configuration
changes needed.
Availability, speed, throughput, capacity, and
robustness is not affected even if you gain 10,000
users overnight.
Unlimited storage You pay as you go. Inexpensive
and no capital outlay. Great for start-ups!
Data is accessible from any location. Since it is
based on the Amazon infrastructure, it is probably
more reliable than other cheap data storage
providers
1/23/17

Databases in computer World

40

Disadvantages of using S3
Not user-friendly for beginner level
computer users. S3 is basically UI-less.
Trust. Not all types of business or services might
be comfortable with storing their data in the
'cloud', especially those with extremely sensitive
and confidential data. E.g. Banking
Although it promises 99.9% of uptime in its . in
2008 it has 2 major outages in February and July,
bringing down Web 2.0 start-ups like Twitter.
Back in 2007, S3 had Speed issue with reading
and writing of data
1/23/17

Databases in computer World

41

Requirements
To get started using S3, an AWS account
is needed. An AWS account is simply an
Amazon.com account that has AWS
services enabled.
Sign up at https://aws.amazon.com/
After creating the AWS account, you need to sign up for
53 by clicking the sign up for this web service button at
this
A credit card needs to be associated with the account.
You will be given a Access Key ID and secret Access Key
on successful creation.(note: they are not emailed to you)

1/23/17

Databases in computer World

42

Pricing
Charges for using S3 is based on the location of your
buckets
You are billed according to storage(average), data transfer
in and out and the number of requests per month.
There is no minimum fee to use S3, you
pay for only what you use.
You can view your current charges incurred almost
immediately on the S3 portal.
Detailed usage reports can also be downloaded in xml or
csv format.

1/23/17

Databases in computer World

43

Pricing US usage
Storage
$0.15 par GB-Month storage used
Data Transfer
$0.100 per GB a|\ dale transfer in
$
$
$
$

0.170
0.130
0.110
0.100

pet GB pet GB per GB per GB -

first 10 TB / month data transfer uut


next 40 TB I month data transfer out
next 100 TB / month data transfer out
Data transfer out/ month over150 TB

Requests
$0.01 per 1000 PUT, POST, at LIST requests
$0.01 per 10,000 GET and all other requests*
' No charge for delete requests
1/23/17

Databases in computer World

44

Implementation
To start using S3, get hold of your S3 access
key ID and secret access key via the AWS
portal.
Next, get hold of an application capable of
managing S3. Here are a few resources:
Spaceblock: Windows Application
S3 Web interface: Web App/Interface
S3 Firefox organizer: Firefox add-on
These applications make objects more
manageable because they provide a directory
structure similar to windows explorer.
1/23/17

Databases in computer World

45

Implementation
What can we use S3 for?
- HTML microsites
- Flash microsites
- Media storage
- Backups
For HTML and Flash microsites, custom URLs can be created
by using CNAME to create DNS alias.
No server side processing should be used in S3 as they will not
work without web servers(i.e. IIS , Apache)

1/23/17

Databases in computer World

46

Implementation

Amazon web services API support the ability to:


Find buckets and objects
Discover their meta data
Create new buckets
Upload new objects
Delete existing buckets and objects
When manipulating the buckets you can optionally specify
where they should be stored.
Use REST API preferably something that abstracts out even
that: Jets3t; s3cmd (command line)
Bittorrent access to S3 is also available

1/23/17

Databases in computer World

47

www.MyWebSite.com
(dynamic data)
Amazon Route 53
(DNS)

media.MyWebSite.com
(static data)

Elastic Load
Balancer

Amazon
CloudFront

Auto Scaling group : Web


Tier

Amazon EC2

Auto Scaling group : App


Tier

Amazon RDS
Availability Zone #1

Amazon
RDS

Amazon S3

Availability Zone #2
1/23/17

Databases in computer World

48

Original author(s) MySQL AB


Developer(s) Oracle Corporation
Initial release
23 May 1995; 21 years ago
Stable release
5.7.16[1] / 12 October 2016
Written in
C, C++[2]
Operating system Windows, Linux, Solaris, OS X, FreeBSD[3]
Available in English
Type
RDBMS
License
GPL (version 2) or proprietary[4]
Website
www.mysql.com
MySQL was created by a Swedish company,MySQL AB, founded byDavid Axmark, Allan
Larsson andMichael "Monty" Widenius.
The first version of MySQL appeared on 23 May 1995.
Windows version was released on 8 January 1998 for Windows 95 and NT
Sun Microsystemsacquired MySQL AB in 2008.
Oracle acquired Sun Microsystems on 27 January 2010.

1/23/17

Databases in computer World

49

1/23/17

Databases in computer World

50

1/23/17

Databases in computer World

51

1/23/17

Databases in computer World

52

1/23/17

Databases in computer World

53

1/23/17

Databases in computer World

54

1/23/17

Databases in computer World

55

1/23/17

Databases in computer World

56

1/23/17

Databases in computer World

57

1/23/17

Databases in computer World

58

1/23/17

Databases in computer World

59

1/23/17

Databases in computer World

60

1/23/17

Databases in computer World

61

1/23/17

Databases in computer World

62

1/23/17

Databases in computer World

63

1/23/17

Databases in computer World

64

1/23/17

Databases in computer World

65

1/23/17

Databases in computer World

66

1/23/17

Databases in computer World

67

1/23/17

Databases in computer World

68

1/23/17

Databases in computer World

69

1/23/17

Databases in computer World

70

1/23/17

Databases in computer World

71

You might also like