You are on page 1of 5

Amazon Redshift Performance Metrics vs competitors

Self-Hosting
According to Amazons calculation it generally costs between $19,000 and $25,000
per terabyte per year, at list prices, to build and run a good-sized data warehouse
on your own. Amazon Redshift, all-in, will cost you less than $1,000 per terabyte per
year."

Redshift vs other vendor offerings

Columnar
Data
Storage
Advanced
Compressio
n
Supports
Sort key
for batter
dynamic
sorts
Can run on
Virtualized
Platforms

Index
Support

Redshift

Teradata

HP Vertica

Oracle
Database

Available

EMC
GreenPlu
m
Available

Available

Available

Available

Available

Available

Available

Available

Supported

Not
Supported

Not
Supported

Not
Supported

Not
Supported

Yes.
Since
Amazon
Redshift is
built upon
PostgreSQL
it has
inherent
capability
to run on
commodity
machines
running
virtual
platforms
Not
Available

Information
not
Available

Not
Supported
Vertica 6.1
does
support
Hardware
Virtual
Machine
but
nowhere
close to
Redshifts
offering of
Data as a
Service
Not
Supported

Information
not
Available

Information
not
Available

No
Information
Available

Supported

Supported

Redshift vs the Hadoop Open Source Platform

Available

Apache Hadoop is an open-source software framework for distributed storage and distributed
processing of Big Data.

Redshift

Hadoop

Nodes Possible
Max Node Size
Performance

100

Unlimited

16 Tb

Unlimited

Performs better at
Terabyte level
data( which is usually
sufficient for most
businesses)

Ease of Migration

As it uses PostgreSQL as
the underlying database
and SQL queries it is
already familiar to most
developers

Data formats accepted

Limited. Presently no
support for XML, data
arrays etc

Performs better at
Petabyte level data( only
relevant for large
businesses which will
anyways want to
maintain their own
warehouse)
System administrators
will need to learn Hadoop
architecture and tools as
they are quite different
and developers will need
to learn coding in Pig or
MapReduce.
All datatypes supported

Total Cost of Running Hadoop vs Redshift on a per Query


basis

Thus we can conclude that Redshift is more suited to most businesses except the
very large ones (like a database for entire Tata Group) where Hadoop might be a
better choice albeit at a higher cost than Redshift.

Query Performance with other technologies

Some Additional Information which I thought might be


useful for other parts of the project
The distinction between the previously available Amazon Relational Database
Service (RDS) and Redshift is that the latter is exclusively for warehousing and
analytics (as opposed to transactional database uses) and is capable of big-data
scale. "RDS is based on Microsoft SQL Server, Oracle and MySQL, and those aren't
systems that are designed to do petabyte-scale data warehousing,"

http://www.informationweek.com/software/information-management/amazondebuts-low-cost-big-data-warehousing/d/d-id/1107568?
http://dwh-bi-etl-reviews.quora.com/Amazon-Redshift-%E2%80%93-Differentiatorsand-Limitations
http://www.vertica.com/2010/11/23/life-beyond-indices-the-query-benefits-ofstoring-sorted-data/
http://aws.amazon.com/documentation/redshift/
http://snowplowanalytics.com/blog/2013/09/27/how-much-does-snowplow-cost-torun/

You might also like