Professional Documents
Culture Documents
by Garrett Alley
9 min read • 1 May 2018
Extract, Transform, and Load (ETL) tools enable organizations to make their data accessible,
meaningful, and usable across disparate data systems. Typically companies first realize a need
for ETL tools when they learn the cost and complexity of trying to code and build an in-house
solution.
When it comes to choosing the right ETL tool, you have several options. You can try to assemble
open source ETL tools to deliver a solution. This approach can work for some situations, but
companies often find themselves needing more — more functionality/features, more flexibility,
and more support.
The next option is to go with a incumbent provider, a solution that deals well with today’s
popular data sources and streams. Incumbent providers offer the stability and comfort of a big or
well-known brand.
The third category of ETL tool is the modern ETL platform. These are often cloud-based
solutions and offer end-to-end support for ETL of data from any existing data source to any
cloud data warehouse. They’re also built to support the ever-growing list of web-based data
streams.
For this post, we’ll dive into the world of incumbent ETL tools — the usual suspects, the
advantages and drawbacks — and then finish up with a quick look at the modern ETL platforms.
https://www.ibm.com/analytics/information-server
Informatica PowerCenter
Informatica PowerCenter is the general name for an ETL product suite including the
PowerCenter Client Tools, Server, and Repository.
Data is stored in the repository where it is accessed by the client tools and the server. Actions are
executed on the server, which connects to sources and targets to fetch the data, apply all
transformations, and load the data into target systems.
https://www.informatica.com/products/data-integration/powercenter.html
iWay Software
Information Builders’ iWay Integration Suite provides both application and data integration
capabilities. Customers use them to manage both structured and unstructured information. The
suite includes: iWay DataMigrator, iWay Service Manager, and iWay Universal Adapter
Framework.
http://www.informationbuilders.com.au/products/integration/suite
https://docs.microsoft.com/en-us/sql/integration-services/sql-server-integration-
services?view=sql-server-2017
OpenText
The OpenText Integration Center is an integration platform that gives organizations the ability to
extract, enhance, transform, integrate and migrate data and content from one or many
repositories to any new destination.
https://www.opentext.com/what-we-do/products/discovery/information-access-
platform/opentext-integration-center
Oracle GoldenGate
Oracle GoldenGate is a comprehensive software package for real-time data integration and
replication in heterogeneous IT environments.
http://www.oracle.com/technetwork/middleware/goldengate/overview/index.html
Pervasive Software
Pervasive’s Data Integrator platform is an enterprise data integration software solution that
enables companies to build connections between any kind of data source and application. Data
Integrator supports real-time integration scenarios.
http://www.pervasive.com/integration/Products/DataIntegrator.aspx
https://www.sap.com/products/data-services.html
http://support.sas.com/software/products/entdis/index.html#s1=1
https://docs.oracle.com/cd/E21454_01/html/821-2610/dsgn_di-taskoverview_c.html
Sybase
Sybase ETL includes Sybase ETL Development and Sybase ETL Server.
Sybase ETL Development is a GUI tool for creating and designing data transformation projects
and jobs. This tool provides a complete simulation and debugging environment, designed to
speed the development of ETL transformation flows. Sybase ETL Development includes an ETL
Development Server that controls the actual processing, such as connecting to databases and
executing procedures.
Sybase ETL Server is a scalable and distributed grid engine, which connects to data sources and
extracts and loads data to data targets using transformation flows (designed using Sybase ETL
Development).
http://www.sybase.com
SyncSort
SyncSort Cloud Solutions access and integrates data from various sources and facilitates moving
that data to cloud repositories.
https://www.syncsort.com/en/Solutions/Cloud-Solutions
Batch data transformation tools can be hard to implement for cross platform data sources,
especially where Change Data Capture (CDC) is involved. When something goes wrong with
your batch data upload, you need to track down the problem, troubleshoot, and re-submit the job,
quickly. This kind of error handling is crucial as lost data can be a huge issue in cases where you
have, for example, surpassed your 24-hour allotment of API calls in the data warehouse, or
where the incoming data gets backed up and CDC information is lost or overwritten.
And what about the ever-growing number of streaming and other types of data sources? They are
not a good fit for toolsets designed and built around batch processing, especially with today’s
demands that the freshest data be available as quickly as possible.
Today’s trends continue to point to the cloud, and moving IT and ETL to the cloud only makes
sense. Cloud-based ETL services are the natural next step. They support the same batch model as
their predecessors, but they are taking ETL to the next stage, often offering support for real-time
data, intelligent schema detection, and more.
Modern demands on ETL processes render the batch approach nearly obsolete. Gone are the
days of nightly financial or inventory updates, as companies and their customers demand the
freshest data. Companies keeping up with the ever-growing list of data streams need real-time
ETL processing.
And with the need for real-time data access comes a fundamental change in architecture. Today’s
model is based on stream processing and distributed message queues such as Kafka. Modern
approaches from companies like Alooma and others incorporate these new technologies to offer
SaaS platforms and on-prem solutions. As part of the stream, modern ETL platforms offer
differing levels of transformation, from almost none (instead, transformation happens in the data
warehouse, after loading: aka ELT) to full control via code (python, Java, etc.).
The last piece of the puzzle is data integrity. What happens if part of the process lags behind or
fails? What happens to the data traveling through the pipeline? Any truly modern ETL platform
needs to have a robust safety net built in for error handling and reporting.
Alooma
Alooma is an enterprise data pipeline platform, built for the cloud. Alooma provides data teams a
modern, scalable cloud-based ETL solution, bringing together data from any data source into any
data warehouse, all in real time.
Error handling: Handling, monitoring/reporting, restreaming
https://www.alooma.com/
Confluent
Confluent is a full-scale data streaming platform based on Apache Kafka and capable of publish-
and-subscribe and storage and processing of data within the stream. Confluent offers an open
source version of its platform.
https://www.confluent.io/
Fivetran
Fivetran is a SaaS data integration tool that extracts data from different cloud services, databases,
and business intelligence (BI) tools and loads it into a data warehouse.
https://fivetran.com/
FlyData
FlyData is a SaaS data migration tool that enables management of the data load process from
MySQL, PostgreSQL, MariaDB, Percona, and logs in CSV/TSV/JSON to an Amazon Redshift
data warehouse.
https://www.flydata.com/
Matillion
Matillion offers cloud data integration ETL tools built specifically for Amazon Redshift, Google
BigQuery, and Snowflake.
https://www.matillion.com/
SnapLogic
SnapLogic provides data integration platform as a service tools for connecting cloud data
sources, SaaS applications and on-prem business software applications.
https://www.snaplogic.com/
Stitch Data
Stitch is a cloud-first, developer-focused tool for rapidly moving data.
https://www.stitchdata.com/
StreamSets
StreamSets is a cloud native collection of products to control data drift; the problem of changes
in data, data sources, data infrastructure and data processing.
https://streamsets.com/
Striim
Striim (pronounced “stream”) is a real-time, streaming analytics and data integration platform.
https://www.striim.com/
Wrapping up
Today’s need for advanced data analytics requires a modern approach to data integration.
Whether you’re looking to incorporate data from databases, streaming services, files, or other
sources, choosing the right toolset is critical. A modern ETL platform, built in and for the cloud
can give your business the edge you need.
Ready to start? Get your ETL pipeline up and running in minutes with Alooma.