Introduction to Azure Data Factory V2

Share this Post

It was only a matter of time till the complete Microsoft on-premise Business Intelligence stack was moved to the Azure Paas service. With the introduction of Azure Data Factory (ADF) back in 2015 and the recent launch of ADF V2, which is in public preview for the moment, it is now possible to orchestrate complex hybrid ETL/ELT operations in the cloud.

This also makes running a complete Business Intelligence platform as a service in Azure possible without having to provision virtual machines. ADF, in a nutshell, is SQL Server Integration Services (SSIS) running in the cloud. ADF V2 is a new and improved version of V1, which supports more data sources and better flexibility. This makes it a great tool for hybrid data integration solutions which Microsoft prides itself on, since they are the only player which has both on-premise and cloud computing offerings.

Azure Data Factory V2 has even made it possible to easily Lift and Shift your existing SSIS packages to the cloud by providing an integration runtime that can be created and linked to an Azure SQL Database for the SSISDB. This is what we are using to deploying our on-premise packages to.
There is a great demo video here which explains this in more detail. The immediate benefits of moving existing SSIS packages to ADF are to:

  • Reduce operational costs
  • Increase high availability
  • Increase scalability

Another great feature of ADF V2 is the ability to directly create data pipelines directly from the Azure portal using the included visual development tool. As shown below, once you create your ADF V2 instance, you can use the Author&Monitor button to launch the development environment.



In addition, in ADF V2, all your data factory pipelines, datasets, connections & triggers can be source controlled with VSTS GIT integration directly from the web interface. It can also be directly exported as ARM templates for easy replication across multiple environments.

This enables DevOps processes of Continuous Integration (CI) / Continuous Delivery (CD), which historically was not possible with data integration projects. This article here explains this.

Another key feature of ADF V2 is the integration of Azure Databricks, which is the advanced analytics toolkit developed for Azure based on the Apache Spark. This makes the running large data operations incredibly fast due to the in-memory computer power and scale of the Spark technology.

Look forward to more posts in this area with demos on some of these features.


Author:
Stefan Outschoorn
Consultant-Project Delivery