ETL stands for Extract-Transform-Load and is the most predominant and traditional form of data integration. ETL is a data pipeline that first extracts data from the source, transforms the data models that analysts can then turn into reports and dashboards, then to finally be loaded into data warehouses.
ETL has been around since the 1970’s and is hinged on technological conditions that no longer exist, which are high costs of computation, storage, and bandwidth. All no longer relevant because of cloud storage. Even though ETL’s roots were firmly in the 1970’s, it wasn’t until the late 1980’s and early 1990’s when data warehouses took center stage that we really began to see tools built for the sole purpose of loading data into those warehouses. The very first tools weren’t very sophisticated, but they got the job done. As the data amounts grew, so did the warehouses, and by default the tools became more and more sophisticated as well.
When cloud-based storage came on the scene, the way we looked at data processing and data storage was changed forever.
ELT stands for Extract-Load-Transform. The first step in ELT is extraction and that part works pretty much the same as it does for the ETL data management approach. Raw streams of data from applications, virtual infrastructure, and software are consumed either entirely or by a predetermined set of rules. The next step is where ETL and ELT diverge.
In ELT, the second step is LOAD. Instead of handing off all the data into an interim processing warehouse structure to be transformed, the data is delivered directly to the target data warehouse. The data is then sorted and normalized. Some of it or all of it may be kept on hand so it can be accessible for customized reporting.
The main difference between ETL and ELT is how much data is stored in the data warehouse and where the transformation of the data takes place.
When using ETL, the data is loaded in the warehouse or the data lake as is. Nothing is done to transform the data before the loading. The job is somewhat easier this way because it only needs an origin and a destination. On the other hand, the ELT process data is transformed before being loaded into the data warehouses. Businesses and analysts are able to access the information much quicker this way.
Just keep in mind these key ways in which ETL and ELT differ.
The main problem with ETL as a data integration solution is that it is based in a world where cloud based storage has not yet come onto the scene. Quite simply, it is outdated because it predated cloud storage solutions.
When building an ETL pipeline data analysts and data engineers normally follow a certain workflow that includes the following steps. Let’s take a look at them and see if we can spot the problems.
The above process is resource intensive and can end up an endless cycle because the data all runs on custom code. The code can mess up leading to broken pipelines with little to no warning. And if a business has any new requirements, the data engineers will have to go back in and rework coding via extensive revisions.
ETL is really only used by two distinct groups today.
If a company has the talent, budget, time, and infrastructure to upkeep a complex data pipeline, ETL and be pretty powerful. However, all of that is nearly impossible for a small or medium sized company to deal with.
Besides being outdated, labor intensive, and expensive, a few other issues arise when using the ETL data integration solution. One of those reasons is that it is an inflexible solution. ETL is rigid in that it makes the data analysts try to figure out in advance every single way they will want to use the data and every single report they may want to run. Changes will equal a lot of time and money. Another reason is that data analysts are possibly flying on the blind. Since they only see the data that is kept after being transformed, they could draw incorrect conclusions because data hadn’t been correctly sliced.
ETL is not only outdated due to cloud storage technology, but is also super time intensive, easily breakable, and not very cost efficient.
We saw how the ETL workflow was. Now let’s take a look at the ELT workflow in contrast. You’ll also notice that the ETL workflow features a shorter cycle.
As you can see, under ELT, the extraction and loading of the data are independent of the transformation phase. By separating the extraction and loading from the transformation of the data, the output that comes from the extraction and loading no longer requires customization. When you take this and the growth of the cloud into consideration, you will realize that the extraction and loading can now be automated, outsourced to a 3rd party, or even scaled up or even down as needed via the cloud.
A Few Benefits
In many ways, ELT is easier to use, much more affordable, and way more practical than ETL when it comes to data integration solutions. Automated ELT will continue to grow in importance and in preference as other things take place.
ELT platforms will continue to grow and mature and cloud infrastructure costs will continue to decline, driving more and more businesses to see the benefits of using ELT over ETL for their data integration solution.