You are on page 1of 9

2018 ETL Tools Comparison

by Garrett Alley
9 min read • 1 May 2018
Extract, Transform, and Load (ETL) tools enable organizations to make their data accessible,
meaningful, and usable across disparate data systems. Typically companies first realize a need
for ETL tools when they learn the cost and complexity of trying to code and build an in-house
solution.

When it comes to choosing the right ETL tool, you have several options. You can try to assemble
open source ETL tools to deliver a solution. This approach can work for some situations, but
companies often find themselves needing more — more functionality/features, more flexibility,
and more support.

The next option is to go with a incumbent provider, a solution that deals well with today’s
popular data sources and streams. Incumbent providers offer the stability and comfort of a big or
well-known brand.

The third category of ETL tool is the modern ETL platform. These are often cloud-based
solutions and offer end-to-end support for ETL of data from any existing data source to any
cloud data warehouse. They’re also built to support the ever-growing list of web-based data
streams.

For this post, we’ll dive into the world of incumbent ETL tools — the usual suspects, the
advantages and drawbacks — and then finish up with a quick look at the modern ETL platforms.

Incumbent ETL tools overview


Incumbent ETL tools make up the majority of the ETL tool market and that stands to reason.
They’ve been around the longest and many were designed by very large companies (Microsoft,
IBM, etc.) so the pre-installed customer base was substantial.
Some of these tools consist of a suite of tools used together, customized to solve particular
problems. And because many companies have their data stored in legacy, monolithic databases
and systems, the manufacturers are well positioned to provide tools to migrate that data and to
support the existing batch-processing approach.

Popular incumbent ETL tools


This is not a complete list, but it does cover the major offerings.

IBM InfoSphere Information Server


IBM InfoSphere Information Server is an ETL tool and part of the IBM Information Platforms
Solutions suite and IBM InfoSphere. It uses a graphical notation to construct data integration
solutions and is available in various versions (Server Edition, Enterprise Edition, and MVS
Edition).

https://www.ibm.com/analytics/information-server

Informatica PowerCenter
Informatica PowerCenter is the general name for an ETL product suite including the
PowerCenter Client Tools, Server, and Repository.

Data is stored in the repository where it is accessed by the client tools and the server. Actions are
executed on the server, which connects to sources and targets to fetch the data, apply all
transformations, and load the data into target systems.

https://www.informatica.com/products/data-integration/powercenter.html

iWay Software
Information Builders’ iWay Integration Suite provides both application and data integration
capabilities. Customers use them to manage both structured and unstructured information. The
suite includes: iWay DataMigrator, iWay Service Manager, and iWay Universal Adapter
Framework.
http://www.informationbuilders.com.au/products/integration/suite

Microsoft SQL Server Integration Services


Microsoft SQL Server Integration Services (SSIS) is a platform for building high performance
data integration solutions, including ETL packages for data warehousing.

https://docs.microsoft.com/en-us/sql/integration-services/sql-server-integration-
services?view=sql-server-2017

OpenText
The OpenText Integration Center is an integration platform that gives organizations the ability to
extract, enhance, transform, integrate and migrate data and content from one or many
repositories to any new destination.

https://www.opentext.com/what-we-do/products/discovery/information-access-
platform/opentext-integration-center

Oracle GoldenGate
Oracle GoldenGate is a comprehensive software package for real-time data integration and
replication in heterogeneous IT environments.

http://www.oracle.com/technetwork/middleware/goldengate/overview/index.html

Pervasive Software
Pervasive’s Data Integrator platform is an enterprise data integration software solution that
enables companies to build connections between any kind of data source and application. Data
Integrator supports real-time integration scenarios.

http://www.pervasive.com/integration/Products/DataIntegrator.aspx

Pitney Bowes Software


Pitney Bowes offers a large suite of tools and solutions targeted around data integration. Sagent
Data Flow is a flexible integration engine that collates data from disparate sources and provides a
comprehensive set of data transformation tools to enhance its business value.
https://www.pitneybowes.com/us/customer-information-management/data-integration-
management.html

SAP Business Objects Data Services


Formerly Business Objects Data Integrator, SAP BusinessObjects Data Services (BODS) is an
ETL tool used for data integration, data quality, data profiling, and data processing. It allows you
to integrate and transform trusted data-to-data warehouse systems for analytical reporting.

https://www.sap.com/products/data-services.html

SAS Data Management


Built on the SAS platform, SAS Data Management is SAS’s entry into ETL the tool market. The
platform consists of a large suite (20+) of SAS tools and services.

http://support.sas.com/software/products/entdis/index.html#s1=1

Sun Java Composite Application Platform Suite


Sun’s ETL and Data integration tools are a part of a large Java Composite Application Platform
Suite (CAPS). CAPS, or Java CAPS, is a standards-based Enterprise Service Bus software suite
from Oracle Corporation. Java CAPS has several components which help to integrate existing
applications and deliver new business services in a service-oriented architecture environment.

https://docs.oracle.com/cd/E21454_01/html/821-2610/dsgn_di-taskoverview_c.html

Sybase
Sybase ETL includes Sybase ETL Development and Sybase ETL Server.

Sybase ETL Development is a GUI tool for creating and designing data transformation projects
and jobs. This tool provides a complete simulation and debugging environment, designed to
speed the development of ETL transformation flows. Sybase ETL Development includes an ETL
Development Server that controls the actual processing, such as connecting to databases and
executing procedures.
Sybase ETL Server is a scalable and distributed grid engine, which connects to data sources and
extracts and loads data to data targets using transformation flows (designed using Sybase ETL
Development).

http://www.sybase.com

SyncSort
SyncSort Cloud Solutions access and integrates data from various sources and facilitates moving
that data to cloud repositories.

https://www.syncsort.com/en/Solutions/Cloud-Solutions

Limitations of incumbent ETL tools


The biggest limitation of incumbent tools is that they were designed to work in batch: gather
some data, upload it, gather more data, upload it, etc. Batch loading of data works in some
situations; however, there are issues with a batch-only approach.

Batch data transformation tools can be hard to implement for cross platform data sources,
especially where Change Data Capture (CDC) is involved. When something goes wrong with
your batch data upload, you need to track down the problem, troubleshoot, and re-submit the job,
quickly. This kind of error handling is crucial as lost data can be a huge issue in cases where you
have, for example, surpassed your 24-hour allotment of API calls in the data warehouse, or
where the incoming data gets backed up and CDC information is lost or overwritten.

And what about the ever-growing number of streaming and other types of data sources? They are
not a good fit for toolsets designed and built around batch processing, especially with today’s
demands that the freshest data be available as quickly as possible.

Modern ETL tools overview


The modern suite of ETL tools were built with real-time, streaming data processing and the
cloud in mind. These latest entries were born to integrate well with advanced cloud data
warehouses and to support the ever-growing number of data sources and streams.

Today’s trends continue to point to the cloud, and moving IT and ETL to the cloud only makes
sense. Cloud-based ETL services are the natural next step. They support the same batch model as
their predecessors, but they are taking ETL to the next stage, often offering support for real-time
data, intelligent schema detection, and more.

Modern demands on ETL processes render the batch approach nearly obsolete. Gone are the
days of nightly financial or inventory updates, as companies and their customers demand the
freshest data. Companies keeping up with the ever-growing list of data streams need real-time
ETL processing.

And with the need for real-time data access comes a fundamental change in architecture. Today’s
model is based on stream processing and distributed message queues such as Kafka. Modern
approaches from companies like Alooma and others incorporate these new technologies to offer
SaaS platforms and on-prem solutions. As part of the stream, modern ETL platforms offer
differing levels of transformation, from almost none (instead, transformation happens in the data
warehouse, after loading: aka ELT) to full control via code (python, Java, etc.).

The last piece of the puzzle is data integrity. What happens if part of the process lags behind or
fails? What happens to the data traveling through the pipeline? Any truly modern ETL platform
needs to have a robust safety net built in for error handling and reporting.

Popular modern ETL platforms & tools


Here’s a list of the most common modern ETL platforms and tools.

Alooma
Alooma is an enterprise data pipeline platform, built for the cloud. Alooma provides data teams a
modern, scalable cloud-based ETL solution, bringing together data from any data source into any
data warehouse, all in real time.
Error handling: Handling, monitoring/reporting, restreaming

Transformation: ETL with support for python transforms

https://www.alooma.com/

Confluent
Confluent is a full-scale data streaming platform based on Apache Kafka and capable of publish-
and-subscribe and storage and processing of data within the stream. Confluent offers an open
source version of its platform.

Error handling: Monitoring only

Transformation: ETL, Kafka Streams API

https://www.confluent.io/

Fivetran
Fivetran is a SaaS data integration tool that extracts data from different cloud services, databases,
and business intelligence (BI) tools and loads it into a data warehouse.

Error handling: Monitoring only

Transformation: ELT, limited

https://fivetran.com/

FlyData
FlyData is a SaaS data migration tool that enables management of the data load process from
MySQL, PostgreSQL, MariaDB, Percona, and logs in CSV/TSV/JSON to an Amazon Redshift
data warehouse.

Error handling: Yes, buffered


Transformation: ELT, limited

https://www.flydata.com/

Matillion
Matillion offers cloud data integration ETL tools built specifically for Amazon Redshift, Google
BigQuery, and Snowflake.

Error handling: Supported via code, not built-in

Transformation: ETL, graphical builder

https://www.matillion.com/

SnapLogic
SnapLogic provides data integration platform as a service tools for connecting cloud data
sources, SaaS applications and on-prem business software applications.

Error handling: Supported, but not built-in

Transformation: ETL, graphical builder

https://www.snaplogic.com/

Stitch Data
Stitch is a cloud-first, developer-focused tool for rapidly moving data.

Error handling: Manual, records are logged in rejection table

Transformation: Quasi-ETL, limited

https://www.stitchdata.com/

StreamSets
StreamSets is a cloud native collection of products to control data drift; the problem of changes
in data, data sources, data infrastructure and data processing.

Error handling: Yes (Error Record Handling)

Transformation: ETL, code and GUI

https://streamsets.com/

Striim
Striim (pronounced “stream”) is a real-time, streaming analytics and data integration platform.

Error handling: Monitoring only

Transformation: ETL, built-in and Java functions

https://www.striim.com/

Wrapping up
Today’s need for advanced data analytics requires a modern approach to data integration.
Whether you’re looking to incorporate data from databases, streaming services, files, or other
sources, choosing the right toolset is critical. A modern ETL platform, built in and for the cloud
can give your business the edge you need.

Ready to start? Get your ETL pipeline up and running in minutes with Alooma.

You might also like