Connecting Azure Data Factory to FTP servers with Private IP and using Logic App to run ADF Pipelines

Patrick Alexander
6 min readMay 20, 2020

--

Hot to use Azure Data Factory Service to connect to FTP by Static IP and orchestrate the ADF pipeline via Logic App

Business Case:

Almost all FTP servers in enterprises are protected and behind firewalls. Firewalls are great to protect and safeguard the infrastructures from unauthorized access and malicious attacks. This is creating a constraint for users and services who are trying to access the FTP (SFTP). Most of the time FTP servers only allow access to specific static IPs and ports.

This brings the problem on how to resolve the access to FTP from various Azure Services. Most of Azure PaaS Services are exposed by Azure Region IP Ranges. Customers are not willing to open a wide range of IPs for Azure services and that’s were the challenge becomes interesting.

Scenario:

The scenario I’m going to describe in this article is a very common case for accessing FTP server from Azure Data Factory (ADF) service.

My customer need to do daily file drops into a 3rd party organization which uses FTP server. 3rd party organization only allows specific IP access to their FTP server. My customer currently is using on-premise servers and they are moving their operation to Azure, we need to migrate the FTP process into Azure as-well.

Solution:

We choose ADF Pipelines to build the export files and we are going to use FTP task in ADF to copy the files. In order to be secure we need to have one Static IP that customer can open the firewall for accessing.

ADF runtimes are either auto-resolve or Self Hosted.

1-ADF with Auto-resolving Integration Runtime

ADF accessing FTP with IP range from AZ Regions

Reference: Azure Data Factory IP range for AZ Regions

2-ADF Self Hosted Integration Runtime server with Reserved Static IP

ADF accessing FTP with Private IP vie Integration Runtime Server

Reference: How to create self hosted Azure ADF runtime

We Built our own VM in Azure and installed the ADF Run time agents on it and connect to ADF Runtime environment.

After building out the VM and connecting to ADF IR we will have a new runtime server (below is the example, server is unavailable since the VM is stopped)

3-Optimizing the runtime and cost for ADF with self hosted server

Now that we know how to access the FTP server via private IP we need to learn how to optimize the cost and process for executing a pipline in ADF. Obviously we can keep the ADF Integration Runtime Server online 24/7 but most of the times we only need the server for a specific period of time. For that matter we can control on when to run the VM and execute the ADF pipelines based on our desired schedule. This workflow will look like ethis:

  • Start ADF IR server
  • Run ADF Pipeline
  • Stop ADF IR server

Starting the IR server and waiting for warming up and establishing the IR connection is a time consuming process and depends on VM types/sizes and environment.

4-Starting and Stopping VM via webhooks

In order to Start or Stop VM we can use Azure Automation and Runbooks to automate and expose webhooks for Starting and Stopping VM.

Runbooks in Azure Autoomation

Reference: Azure Automation

Graphical view of the Runbook

Reference: Create a Graphical Runbook

Add Webhook for a Runbook

Reference: Azure automation via webhooks

Webhook

5-ADF Pipeline running over ADF IR server

Back to my ADF, I built two pipelines, first one to export the file into Azure Blob and the second to FTP the file to 3rd party server. I wrapped both pipelines in one single pipeline for execution.

Sample Pipeline in ADF that run on ADF IR Server

The important piece is to use Self Hosted Integration Runtime server for FTP connection. This will allow to expose only one static IP for the VM

Azure VM hosting ADF Integration Runtime

Connecting Azure VM in ADF Integration Runtime

ADF Integration Runtime server
FTP connection via ADF IR Server

6-Logic App to the rescue

We have all the pieces now and need to glue them together and build a workflow to execute them in order. Azure Logic App is a great service for this purpose and I’m going to take advantage of it.

I built a logic app with the following workflow you see here.

Logic App Workflow
  • Trigger every weekday at 7:00 AM
  • Start ADF IR VM over the webhook
  • Wait 7 min to start the VM and connect to ADF Runtime
  • Run ADF Pipeline
  • Wait 5 min to complete the Pipeline execution
  • Stop ADF IR VM
  • Send confirmation Email

Let me describe each in more detail.

Trigger: In order to start a LogicApp we need a trigger, I’m using time trigger that runs on specific details

Scheduled Trigger

Start VM: Using the webhook (with Web Post operation in Logic App) that I created in Azure Automation Runbook

Webhook to start VM

Wait: 7 min , I found out that it takes few min for VM to start and then another few minutes for ADF runtime to connect to VM, so I’m waiting for that here.

Wait Delay

Run ADF Pipeline: We execute the ADF pipeline here

Run ADF Pipeline

Wait: to complete the Pipeline

Wait Delay

Stop VM: Using webhook to Stop VM

Webhook to Stop VM

Email Confirmation: Sending email via SendGris Service

Email via SendGrid

Reference: Azure SendGrid

Logic App Run history: Finally I can run my logic app and review the execution history.

Logic App Execution History

Conclusion:

Most of the time building workflows with multiple services in Azure is required to develop a complete solution. Understanding the capabilities and limitation of each service is the key to be able to build and deploy a reliable and resilience solution. In this article I tried to demonstrate the different bits and pieces of Azure Services and how I utilized each to build my solution. I hope you find this helpful.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Patrick Alexander
Patrick Alexander

Written by Patrick Alexander

Data Solution Architect at Microsoft

No responses yet

Write a response