I see a lot of confusion when it comes to Azure Data Factory (ADF) and how it compares to SSIS. It is not simply “SSIS in the cloud”. See What is Azure Data Factory? for an overview of ADF, and I’ll assume you know SSIS. So how are they different?
SSIS is an Extract-Transfer-Load tool, but ADF is a Extract-Load Tool, as it does not do any transformations within the tool, instead those would be done by ADF calling a stored procedure on a SQL Server that does the transformation, or calling a Hive job, or a U-SQL job in Azure Data Lake Analytics, as examples. Think of it more as an orchestration tool. SSIS has the added benefit of doing transformations, but keep in mind the performance of any transformations depends on the power of the server that SSIS is installed on, as the data to be transformed will be pushed to that SSIS server. Other major differences:
- ADF is a cloud-based service (via ADF editor in Azure portal) and since it is a PaaS tool does not require hardware or any installation. SSIS is a desktop tool (via SSDT) and requires a good-sized server that you have to manage and you have to install SQL Server with SSIS
- ADF uses JSON scripts for its orchestration (coding), while SSIS uses drag-and-drop tasks (no coding)
- ADF is pay-as-you-go via an Azure subscription, SSIS is a license cost as part of SQL Server
- ADF can fire-up HDInsights clusters and run Pig and Hive scripts. SSIS cannot
- SSIS has a powerful GUI, intellisense, and debugging. ADF has a basic editor and no intellisense or debugging
- SSIS is administered via SSMS, while ADF is administered via the Azure portal
- SSIS has a wider range of supported data sources and destinations
- SSIS has a programming SDK, automation via BIML, and third-party components. ADF does not have a programming SDK, has automation via PowerShell, and no third-party components
- SSIS has error handling. ADF does not
- ADF has “data lineage“, tagging and tracking the data from different sources. SSIS does not have this
Think of ADF as a complementary service to SSIS, with its main use case confined to inexpensively dealing with big data in the cloud.
Note that moving to the cloud requires you to think differently when it comes to loading a large amount of data, especially when using a product like SQL Data Warehouse.