Databases in Computer World

Databases in computer
World
DBMS / RDBMS / NO SQL
By Himanshu Patel
1/23/17
Databases in computer World
DBMS vs RDBMS
No
.
DBMS
RDBMS
DBMS applications store data as file.
In DBMS, data is generally stored in either In RDBMS, the tables have an identifier called primary
a hierarchical form or a navigational form. key and the data values are stored in the form of tables.
Normalization is not present in DBMS. Normalization is present in RDBMS.
3
4
7
8
RDBMS applications store data in a tabular form.
DBMS does not apply any security with RDBMS defines the integrity constraint for the
regards to data manipulation.
purpose of ACID (Atomocity, Consistency, Isolation and
Durability) property.
DBMS uses file system to store data, so in RDBMS, data values are stored in the form of tables,
there will be no relation between the so a relationshipbetween these data values will be
tables.
stored in the form of a table as well.
DBMS has to provide some uniform RDBMS system supports a tabular structure of the data
methods to access the stored information. and a relationship between them to access the stored
information.
DBMS does not support distributed RDBMS supports distributed database.
database.
DBMS is meant to be for small RDBMS is designed to handle large amount of data. it
organization and deal with small data. it supports multiple users.
supports single user.
Examples
of
DBMS
are
file Example
of
RDBMS
are mysql, postgre, sql
systems, xml etc.
server, oracle etc.
1/23/17
SQL vs NoSQL
No
.
SQL (Since 1970)
NoSQL (Since 2000)
Relational Database
Non Relational Database
Store data in table Structure
Store data in Document store, Key value pairs, Wide

column store or graphs
Vertical Scalable
Horizontal scalable
Adding a new property

altering schema changes
Good for structure data
Good for semi structure data
Strict Schema
Flexible schema
Support ACID transactions
Support BASE transaction
Strong consistency supported
Consistency varies per solutions, some solution have

tunable consistency
Storage generally on one server
Storage on distributed servers
1/23/17
may
require New property can be Added on the fly
SQL vs NoSQL
1/23/17
ACID vs BASE
No
.
ACID (relational)
BASE (NoSQL)
Strong consistency
Isolation
Transaction
Program managed
Robust database
Simple database
Simple code (SQL)
Complex code
Available and consistent
Available and partition-tolerant
Scale-up (limited)
Scale-out (unlimited)
Shared (disk, mem, proc etc.)
Nothing shared (parallellizable)
Storage generally on one server
Storage on distributed servers
1/23/17
Weak consistency
Last write wins
Why NoSQL is better
It supports semi-structured data and volatile data

It does not have schema
Read/Write throughput is very high
Horizontal scalability can be achieved easily
Will support Bigdata in volumes of Terra Bytes & Peta Bytes
Provides good support for Analytic tools on top of Bigdata
Can be hosted in cheaper hardware machines
In-memory caching option is available to increase the performance of queries
Faster development life cycles for developers
1/23/17
CAP Theorem
1/23/17
Database System
Number of systems per category, December 2016

1/23/17
Database Popularity
This chart shows the popularity of each category.

1/23/17
Database Model
Key value- Column
stores
family
RDF
Stores
Multimod Document Hierarchic

el
al
Databases
Amazon
DynamoDB
Apache Jena
arangodb
Mongo DB
Datomic
Couch DB
Big table
Redis
Hbase
Riak
hyper table
Orient DB
Rethink DB
Voldemort
Cassandra
FatDB
Raven DB
AlchemyDB
terrastore
FoundationDB
Apache
Accumulo
Sesame
BangDB
Jas DB
KAI
Raptor DB
hamsterdb
djon DB
Tarantool
EJDB
Maxtable
denso DB
HyperDex
Couchbase
1/23/17
InterSystems
Cach
GT.M
10
Comparison of Databases
MSSQL
MongoDB, Inc
Cassandra
Data Storage Model
Relational DBMS
Document-oriented
Wide column store
JOINs
Yes
No
No
Transaction
ACID
No
No
Data schema
Fixed
Dynamic
Flexible
Scalability
Vertical
Horizontal
Horizontal
Replication
Yes
Query Language
Yes (depending on software Primary-Secondary

edition)
SQL query language
JSON query language
MapReduce
No
Yes
Yes
Triggers
Yes
No
Yes
Foreign keys
Yes
No
No
Concurrency
Yes
Yes
Yes
Company
Microsoft
MongoDB, Inc
Licence
Commercial
Open Source
Apache Software
Foundation
Open Source
Implementation language
C++
C++
Java
OS support
Windows
Drivers for programming

languages
1/23/17
CQL
Windows, Linux, OS X,
BSD ,Linux, OS X, Windows
Solaris
.NET, Java, PHP, Python,
Actionscript, C, C#, C++,
C#, C++, Clojure, Erlang,
Ruby, Visual Basic
Clojure, ColdFusion, D, Dart, Go, Haskell, Java, JavaScript
Delphi, Erlang, Go, Groovy, , Perl, PHP, Python, Ruby,
Haskell, Java, JavaScript,
Scala
Lisp, Lua, MatLab, Perl, PHP,
Databases in computer
World
11
PowerShell,
Prolog, Python,
R, Ruby, Scala, Smalltalk
Key-Value Data Model

Capabilities
The simplest model where each
object is retrieved with a unique
key, with values having no
inherent model
Applications
Applications requiring fast
access to a large number of
objects, such as caches or
queues
Limitations
Cannot update subset of a value
Utilize in-memory storage to

Applications that require fastDoes not provide querying
provide fast access with optional changing data environments like
persistence
mobile, gaming, online ads
Other data models built on top

of this model to provide more
complex objects
As number of objects becomes

large, generating unique keys
could become complex
Databases : BerkleyDB,MemcacheDB,Redis,DynamoDB
1/23/17
12
Document-oriented Data Model

Capabilities
Extension of key-value model,
where value is a structured
document
Applications
Applications that need to
manage a large variety of
objects that differ in structure
Limitations
No standard query syntax
Documents can be highly

Large product catalogs in ecomplex, hierarchical data
commerce, customer profiles,
structures without requiring pre- content management
defined schema
applications
Query performance not linearly

scalable
Supports queries on structured

documents
Join queries across

collections not efficient
Search platforms are also

document-oriented
Databases : MongoDB,CouchDB,Apache Solr,Elastic

Search
1/23/17
13
Column-Oriented Data Model

Capabilities
Applications
Limitations
Extension of key-value
model, where the value is a set
of columns (column-family)
Storing a large number of timestamped data like event logs,

sensor data
No join queries or sub-queries
A column can have multiple

time-stamped versions
Analytics that involve querying

entire columns of data such as
trends or time series analytics
Limited support for aggregation
Columns can be generated at

run-time and not all rows need
to have all columns
Ordering is done per partition,

specified at table creation time
Databases : Cassandra,BigTable,HBase,Apache
Accumulo
1/23/17
14
Graph-oriented Data Model

Capabilities
Applications
Limitations
Models graphs consisting

of nodes and edges with
properties (meta-data)
describing them
Applications that deal with

objects with a large number of
inter-relations
Difficult to scale for large data

sets for generic graphs
Implement very fast graph

traversal operations
Applications like social

networking friends-networks,
hierarchical role based
permissions, complex decision
trees, maps, network topologies
Giraph uses the Bulk

Synchronous Parallelmodel to
overcome some of the scalability
limitations
Also support indexing of meta

data to enable graph traversal
combined with search queries
Databases : Neo4J,OrientDB,Apache
Giraph,AllegroGraph
1/23/17
15
Classification and comparisonof NoSQL

databases
Performa Scalability Flexibility Complexit Functional
nce
y
ity
Key-Value Stores high
high
high
None
variable (none)
Column Stores
high
moderate
low
minimal
Document Stores high
variable (high)
high
low
variable (low)
Graph Databases variable
variable
high
high
graph theory
Relational
Databases
variable
low
moderate
relational algebra
1/23/17
high
variable
16
Database ranking
Ra DBMS
nk
1
Oracle
MySQL
3
4
Microsoft SQL
Server
MongoDB
PostgreSQL
DB2
Cassandra
Redis
Database Score
Model
Relational
DBMS
Relational
DBMS
Relational
DBMS
Document store
Relational
DBMS
Relational
DBMS
Wide column
store
Key-value store
23 Memcached
Key-value store
29.09
1362.65
24 Amazon
DynamoDB
27 CouchDB
Document store
28.98
Document store
22.18
32 Riak KV
Key-value store
10.88
33 MarkLogic
Multi-model
10.3
38 Hazelcast
Key-value store
7.53
39 Sphinx
Search engine
41 Ehcache
Key-value store
6.44
42 OrientDB
Multi-model
6.25
45 InfluxDB
5.32
46 RethinkDB
Time Series
DBMS
Document store
47 Titan
Graph DBMS
5.12
55 Adabas
Multivalue
DBMS
Content store
3.89
1214.18
318.8
318.69
180.56
135.06
109.54
Search engine
99.12
14 Solr
Search engine
66.57
15 HBase
58.19
Splunk
Wide column
store
Search engine
Neo4j
Graph DBMS
36.45
21
http://db-engines.com/en/ranking
Couchbase
Document store
22
1/23/17
Database Score
Model
1417.1
11 Elasticsearch
17
Ra DBMS
nk
(Dec 2016)
53
57 Jackrabbit
29.3
7.3
5.23
3.58
17
Database ranking
Ra DBMS
nk
58 Accumulo
Database Score
Model
Ra DBMS
nk
Wide column
store
Search engine
3.43 104 Sedna
Multi-model
2.69 112 Tamino
72 Apache Drill
Multi-model
2.53 113 jBASE
73 RRDtool
Time Series
DBMS
Multi-model
2.48
Object oriented
DBMS
Time Series
DBMS
RDF store
2.08
Object oriented
DBMS
RDF store
1.59
67 Google Search
Appliance
68 Virtuoso
79 ArangoDB
80 Cach
85 Graphite
86 Jena
94 Db4o
96 RDF4J
98 OpenTSDB
2.73 111 D3
2.15
1.9
116 ObjectStore
117 Giraph
119 BaseX
122 Matisse
1.81 125 Model 204

127 Northgate
Reality
1.49
130 Algebraix
1.47
138 IDMS
Time Series
DBMS
99 IMS
Navigational
1.47
140 Druid
DBMS
103
Versant Object Object oriented
1.3
145 Hypertable
Database
DBMS
1/23/17
(Dec 2016)
Database Score
Model
Native XML
DBMS
Multivalue
DBMS
Native XML
DBMS
Multivalue
DBMS
Object oriented
DBMS
Graph DBMS
1.22
Native XML
DBMS
Object oriented
DBMS
Multivalue
DBMS
Multivalue
DBMS
RDF store
0.92
Navigational
DBMS
Time Series
DBMS
Wide column
store
0.62
1.03
1.03
1.02
0.98
0.95
0.8
0.76
0.74
0.7
0.6
0.56
18
Database ranking
(Dec 2016)
Ra DBMS
nk
Database Score
Model
172 Google Cloud

Bigtable
186 Event Store
Wide column
store
Event Store
0.34
192 4store
RDF store
0.22
197 eXist-db
199 Redland
Native XML
DBMS
RDF store
201 InfiniteGraph
Graph DBMS
0.19
203 ModeShape
Content store
0.18
214 NEventStore
Event Store
0.15
221 Dgraph
Graph DBMS
0.13
0.26
0.2
0.2
1/23/17
19
The History of Cassandra
Original author(s)
Avinash
Lakshman,
Prashant Malik
Developer(s) Apache Software
Foundation
Initial release 2008
Stable release
3.9 / Sep 29,
2016
Written in Java
Operating system
Crossplatform
Website
cassandra.apache.org
1/23/17
20
Cassandra compare to HBase

In comparison to HBase, Cassandra supplies:
Higher performance
True continuous, "always on availability with no single point of
failure
Powerful and easy multidata center / cloud availability zone
support
A simpler architecture (masterless) with easier setup and fewer
requirements
Easier development (SQLlike language with CQL, more)
1/23/17
21
Cassandra Architecture
Cassandra is a distributed, decentralized, fault
tolerant, eventually consistent, linearly
scalable, and columnoriented data store.
Virtual nodes
1/23/17
22
HOW Cassandra WORKS
There are four data buckets that you need to know. MemTable is a
hash table-like structure that stays in memory. It contains actual
cell data. SSTable is the disk version of MemTables. When
MemTables are full, they are persisted to hard disk as SSTable.
Commit log is an append only log of all the mutations that are sent
to the Cassandra cluster
Commit log lives on the disk and helps to replay uncommitted
changes. These three are basically core data. Then there are
bloom filters and index. The bloom filter is a probabilistic data
structure that lives in the memory. They both live in memory and
contain information about the location of data in the SSTable. Each
SSTable has one bloom filter and one index associated with it. The
bloom filter helps Cassandra to quickly detect which SSTable does
not have the requested data, while the index helps to find the
1/23/17
23
exact location of the data
in the SSTable file.
Definition of Cassandra
Apache Cassandra is a
Distributed...
High performance...
Extremely scalable...
Fault tolerant (ie. no single point of failure)...
Post-relational database solution. Cassandra can serve as both real-time
datastore (the system of record") for online/transactional applications.
and as a read-intensive database for business intelligence systems.
1/23/17
24
Architecture Overview
Cassandra was designed with the
Understanding that
system/hardware failures can and do occur
Peer-to-peer, distributed system
All nodes the some
Data partitioned among all nodes in the
cluster
Custom data replication to ensure fault
tolerance
Read/Write-anywhere design
1/23/17
25
Architecture Overview
The schema used in Cassandra is mirrored offer Google
Bigtable. It is o row oriented, column structure
A keyspace is akin to a database in the RDBMS world
A column family is similar to on RDBMS table but is more
flexible/dynamic
A row in a column family is indexed by its key. Other
columns may be indexed as well
1/23/17
26
No Single Point of Failure

All nodes the same
Customized replication affords tunable data
redundancy
Read/write from any node
Can replicate data among different physical
data centre racks
1/23/17
27
Big Data Scalability

Capable of comfortably scaling to petabytes
New nodes = Linear performance increases
Add new nodes online
1/23/17
28
Easy Replication / Data Distribution
Transparently handled by Cassandra

Multi-data centre capable
Exploits all the benefits of Cloud computing
Able to do hybrid Cloud/On-premise setup
1/23/17
29
No Need for Caching Software

Peerto-peer architecture removes need for special caching
layer and the programming that goes with it
The database cluster uses the memory from all participating
nodes to cache the data assigned to each node
No irregularities between a memory cache and database
are encountered
1/23/17
30
Tunable Data Consistency

Choose between strong and eventual consistency
(All to any node responding) depending on the need
Can be done an a per-operation basis, and for
both reads and writes
Handles Multi-data center operations
1/23/17
31
Flexible Schema
Dynamic schema design allows for much more flexible data
storage than rigid RDBMS
Handles structured, semistructured, and unstructured data.
Counters also supported
No offline/downtime for schema changes
Supports primary and secondary indexes
1/23/17
32
Data Compression
Uses Google's Snappy data compression algorithm
Compresses data on a per column family level
Internal tests al Datastax show up to 80%+
compression of raw data
No performance penalty (and some increases in
overall performance due to less physical I/O)!
1/23/17
33
CQL Language
Very similar to RDBMS SQL syntax
Create objects via DDL [e.g. CREATE...)
Core DML commands supported: INSERT, UPDATE,
DELETE
Query data with SELECT
1/23/17
34
Whos Using Cassandra
1/23/17
35
What is AMAZON S3
S3 stands for Simple Storage Service.

It is storage for Internet.
Provided via web service interface (REST and SOAP)
Base on Same infrastructure Amazon uses for its
global network of website.
1/23/17
36
Functions & concepts of S3

Allows unlimited storage of objects(files) containing
of 1 byte to 5 gigabytes each.
Objects consist of the raw object data and metadata
Objects are stored and retrieved using a developerassigned key.
Data are kept secured from unauthorised access
through authentication mechanism.
Objects can be made available to public by the http
or bittorrent protocol.
1/23/17
37
Functions & concepts of s3

All objects are stored in buckets.
A bucket is simply a container for objects. It is
used to partition the namespace of objects at the
highest level.
Buckets are similar to Internet domain names.
They are accessed via
bucketname.s3.amazonaws.com.
Each developer account has a limit of 100 buckets.
More information of buckets can be found at:
http://docsamazonwebservicescom/AmazonS3/2
O06-0301/index.htm?UsingBucke1.htm
1/23/17
38

A key is the unique identifier for an object within
a bucket.
A bucket and a key together uniquely identify
each object in S3.Every object can be addressed
through bucket and key combination.
For example, if your bucket name is mybucket
and key is myhomepagehtml, the URL for the
object will be
http://mybucket.s3.an1azonaws.com/myhome
page.html
1/23/17
39

Scalability. The amount of storage & bandwidth you
need can scale as you like without any configuration
changes needed.
Availability, speed, throughput, capacity, and
robustness is not affected even if you gain 10,000
users overnight.
Unlimited storage You pay as you go. Inexpensive
and no capital outlay. Great for start-ups!
Data is accessible from any location. Since it is
based on the Amazon infrastructure, it is probably
more reliable than other cheap data storage
providers
1/23/17
40
Disadvantages of using S3
Not user-friendly for beginner level
computer users. S3 is basically UI-less.
Trust. Not all types of business or services might
be comfortable with storing their data in the
'cloud', especially those with extremely sensitive
and confidential data. E.g. Banking
Although it promises 99.9% of uptime in its . in
2008 it has 2 major outages in February and July,
bringing down Web 2.0 start-ups like Twitter.
Back in 2007, S3 had Speed issue with reading
and writing of data
1/23/17
41
Requirements
To get started using S3, an AWS account
is needed. An AWS account is simply an
Amazon.com account that has AWS
services enabled.
Sign up at https://aws.amazon.com/
After creating the AWS account, you need to sign up for
53 by clicking the sign up for this web service button at
this
A credit card needs to be associated with the account.
You will be given a Access Key ID and secret Access Key
on successful creation.(note: they are not emailed to you)
1/23/17
42
Pricing
Charges for using S3 is based on the location of your
buckets
You are billed according to storage(average), data transfer
in and out and the number of requests per month.
There is no minimum fee to use S3, you
pay for only what you use.
You can view your current charges incurred almost
immediately on the S3 portal.
Detailed usage reports can also be downloaded in xml or
csv format.
1/23/17
43
Pricing US usage
Storage
$0.15 par GB-Month storage used
Data Transfer
$0.100 per GB a|\ dale transfer in
$
$
$
$
0.170
0.130
0.110
0.100
pet GB pet GB per GB per GB -
first 10 TB / month data transfer uut

next 40 TB I month data transfer out
next 100 TB / month data transfer out
Data transfer out/ month over150 TB
Requests
$0.01 per 1000 PUT, POST, at LIST requests
$0.01 per 10,000 GET and all other requests*
' No charge for delete requests
1/23/17
44
Implementation
To start using S3, get hold of your S3 access
key ID and secret access key via the AWS
portal.
Next, get hold of an application capable of
managing S3. Here are a few resources:
Spaceblock: Windows Application
S3 Web interface: Web App/Interface
S3 Firefox organizer: Firefox add-on
These applications make objects more
manageable because they provide a directory
structure similar to windows explorer.
1/23/17
45
Implementation
What can we use S3 for?
- HTML microsites
- Flash microsites
- Media storage
- Backups
For HTML and Flash microsites, custom URLs can be created
by using CNAME to create DNS alias.
No server side processing should be used in S3 as they will not
work without web servers(i.e. IIS , Apache)
1/23/17
46
Implementation
Amazon web services API support the ability to:

Find buckets and objects
Discover their meta data
Create new buckets
Upload new objects
Delete existing buckets and objects
When manipulating the buckets you can optionally specify
where they should be stored.
Use REST API preferably something that abstracts out even
that: Jets3t; s3cmd (command line)
Bittorrent access to S3 is also available
1/23/17
47
www.MyWebSite.com
(dynamic data)
Amazon Route 53
(DNS)
media.MyWebSite.com
(static data)
Elastic Load
Balancer
Amazon
CloudFront
Auto Scaling group : Web

Tier
Amazon EC2
Auto Scaling group : App

Tier
Amazon RDS
Availability Zone #1
Amazon
RDS
Amazon S3
Availability Zone #2
1/23/17
48
Original author(s) MySQL AB

Developer(s) Oracle Corporation
Initial release
23 May 1995; 21 years ago
Stable release
5.7.16[1] / 12 October 2016
Written in
C, C++[2]
Operating system Windows, Linux, Solaris, OS X, FreeBSD[3]
Available in English
Type
RDBMS
License
GPL (version 2) or proprietary[4]
Website
www.mysql.com
MySQL was created by a Swedish company,MySQL AB, founded byDavid Axmark, Allan
Larsson andMichael "Monty" Widenius.
The first version of MySQL appeared on 23 May 1995.
Windows version was released on 8 January 1998 for Windows 95 and NT
Sun Microsystemsacquired MySQL AB in 2008.
Oracle acquired Sun Microsystems on 27 January 2010.
1/23/17
49
1/23/17
50
1/23/17
51
1/23/17
52
1/23/17
53
1/23/17
54
1/23/17
55
1/23/17
56
1/23/17
57
1/23/17
58
1/23/17
59
1/23/17
60
1/23/17
61
1/23/17
62
1/23/17
63
1/23/17
64
1/23/17
65
1/23/17
66
1/23/17
67
1/23/17
68
1/23/17
69
1/23/17
70
1/23/17
71

Databases in Computer World

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Databases in Computer World

Uploaded by

Copyright:

Available Formats

Databases in computer

Databases in computer World

DBMS applications store data as file.

RDBMS applications store data in a tabular form.

Databases in computer World

SQL (Since 1970)

NoSQL (Since 2000)

Non Relational Database

Store data in table Structure

Store data in Document store, Key value pairs, Wide

Adding a new property

Good for structure data

Good for semi structure data

Support ACID transactions

Support BASE transaction

Strong consistency supported

Consistency varies per solutions, some solution have

Storage generally on one server

Storage on distributed servers

require New property can be Added on the fly

Databases in computer World

Databases in computer World

Simple code (SQL)

Available and consistent

Available and partition-tolerant

Shared (disk, mem, proc etc.)

Nothing shared (parallellizable)

Storage generally on one server

Storage on distributed servers

Databases in computer World

Why NoSQL is better

It supports semi-structured data and volatile data

Databases in computer World

Databases in computer World

Number of systems per category, December 2016

Databases in computer World

This chart shows the popularity of each category.

Databases in computer World

Multimod Document Hierarchic

Databases in computer World

Data Storage Model

Wide column store

Yes (depending on software Primary-Secondary

Drivers for programming

Key-Value Data Model

Utilize in-memory storage to

Other data models built on top

As number of objects becomes

Databases in computer World

Document-oriented Data Model

Documents can be highly

Query performance not linearly

Supports queries on structured

Join queries across

Search platforms are also

Databases : MongoDB,CouchDB,Apache Solr,Elastic

Databases in computer World

Column-Oriented Data Model

Storing a large number of timestamped data like event logs,

No join queries or sub-queries

A column can have multiple

Analytics that involve querying

Limited support for aggregation

Columns can be generated at