Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (2024)

Introduction.

In the post Microsoft Fabric: Integration with ADO Repos and Deployment Pipelines - A Power BI Case Study. we have outlined key best practices for utilizing the seamless integration between Fabric and GIT via Azure DevOps repositories and the use of Fabric Deployment Pipelines, both features intended to improve collaborative development and agile application publishing in the Azure cloud.

Quality and value delivery of any data analysis application depends on the quality of the data that we manage to package, from the greatest quantity and diversity of reliable and truthful data sources.

Fabric Data Pipelines serve as the backbone of data integration and orchestration, allowing organizations to streamline the flow of data across disparate systems, applications, and services.

By moving and manipulating data, Fabric Data Pipelines help ensure data consistency, accuracy, and timeliness, ultimately supporting informed decision-making and driving business value.

In this post we first delve into the integration of Fabric Data Pipelines and Azure DevOps Repos, aimed at improving collaborative development and source code control. Finally, we address the key benefits of using Fabric's content-based strategy for continuous deployment to recommend including data pipelines as part of the content to be deployed and shared.

The role of Data Pipelines in Fabric.

Figure 1 briefly shows the stages for obtaining a data analytics solution.

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (1)

Figure 1. Fabric Data Pipelines are a way to ingest and transform data into a Fabric solution.

There are many options in Fabric for data ingestion and transformations before building the semantic model of a Report or Lakehouse:

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (2)

To date, Fabric lists the following as items that may be subject to source code control: [Overview of Fabric Git integration - Microsoft Fabric | Microsoft Learn]

  • Data pipelines
  • Lakehouse
  • Notebooks
  • Paginated reports
  • Reports (except reports connected to semantic models hosted in Azure Analysis Services, SQL Server Analysis Services or reports exported by Power BI Desktop that depend on semantic models hosted in MyWorkspace)
  • Semantic models (except push datasets, live connections, model v1, and semantic models created from the Data warehouse/lakehouse.)

The primary goal of a Data Pipeline, as an effective way to ingest data in Fabric, is to facilitate the efficient and reliable movement of data from various sources to designated destinations, while also enabling transformations and processing tasks along the way.

Why use source control for Fabric Data Pipelines?

It’s well known that data pipelines sometimes need to handle incremental/update logic by developers. And sometimes they need to recover a previous version to fix errors or maybe with the purpose of reusability.

Implementing source control for Fabric Data Pipelines is essential in modern software development practices. Source control, also known as version control, is a foundational aspect of collaborative software development, providing a systematic approach to managing changes to code and configurations throughout the development lifecycle. In the context of Fabric Data Pipelines, which play a crucial role in orchestrating data workflows and transformations, integrating source control becomes paramount for ensuring transparency, reproducibility, and reliability in data processing pipelines.

Source control is essential for managing Fabric’s data pipelines for several reasons:

  • It allows you to keep track of changes, revert to previous versions, and understand the evolution of your data pipeline over time.
  • Multiple team members can work on different parts of the pipeline simultaneously without overwriting each other’s work.
  • Ensures that any data analysis or transformation can be reproduced, which is critical for debugging and auditing purposes.
  • In case of personnel changes, source control provides continuity, allowing new team members to understand the pipeline’s history and current state.

Next, we show A step-by-step guide to use source control and version management for a Data Pipeline in Fabric.

1. Integrate your workspace with GIT, according to [Microsoft Fabric: Integration with ADO Repos and Deployment Pipelines - A Power BI Case Study.], [Overview of Fabric Git integration - Microsoft Fabric | Microsoft Learn]

2. Create a data pipeline in your workspace. To create a new data pipeline in Fabric you can refer to [Module 1 - Create a pipeline with Data Factory - Microsoft Fabric | Microsoft Learn],[Activity overview - Microsoft Fabric | Microsoft Learn].

Figure 2 shows three pipelines created in a workspace named Workspace Dev 1 and the Workspace’s Settings for the integration with an ADO repository (more details at Microsoft Fabric: Integration with ADO Repos and Deployment Pipelines - A Power BI Case Study. - Mic...

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (3)

Figure 2. Workspace integrated with GIT via an ADO Repo of a project.

3. Sync content with the ADO Repo.

The next figure shows all content synced after committing changes from Fabric UI.

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (4)

If you add new data pipelines or update the content of some of them, this item is marked as “Uncommitted”. Every time you want to sync, select the “Source Control” button and commit the changes.

You will see in the repo in ADO the three pipelines created in Workspace Dev1.

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (5)

To retrieve a pipeline version in the repo in ADO, you must select the committing line in Azure DevOps/Repos/Commits and then, Browse files.

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (6)

Another way to retrieve the content of the pipeline is going to Azure DevOps/Repos/Files and download the .zip file to obtain the code in JSON format.

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (7)

How to update an existing pipeline.

4. From this point, you can return to a previous version of a data pipeline.

To return to a previous version of a data pipeline in Microsoft Fabric, you can use the Update pipeline option: [How to monitor pipeline runs - Microsoft Fabric | Microsoft Learn] Here we are listing the steps to follow and then we will show them with images.

- Navigate to your workspace and hover over your pipeline, click on the three dots to the right of your pipeline name to bring up a list of options, and Select View run history to see all your recent runs and their statuses.

The following picture illustrates the recent run history of a data pipeline.

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (8)

Select “Go to Monitoring hub” produces the following:

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (9)

Select “Back to Main View” shows all itemsthat have already been run:

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (10)

Open the pipeline to fix:

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (11)

And then, Update pipeline.

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (12)

- Here, you can select Update pipeline to make changes to your pipeline from this screen. This selection will take you back to the pipeline canvas for edition, where you can change any mapping, delete activities and so on. You can save it, validate and run it again.

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (13)

Another way is updating the json code.

You can update the json code here:

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (14)

- When selecting this button, you can change the pipeline’s code from the code obtained in Azure DevOps/Repos/Files.

Deployment of data pipelines.

You can define a deployment pipeline in the workspace that contains the most recent and updated items and deploy all of them to the TEST Workspace. If you want to learn more about Fabric Deployment Pipelines refer to Microsoft Fabric: Integration with ADO Repos and Deployment Pipelines - A Power BI Case Study.

You can add Data pipelines in any workspace you want in Fabric. Common data pipeline code can go a long way to ensuring reproducible results in your analysis.

Therefore, this type of content can be used in a deployment pipeline.

Sharing a data pipeline code between a DEV WORKSPACE and a TEST WORSPACE greatly reduces the potential for errors, by helping to guarantee that the transformed data used for model training is the same as the transformed data the models will use in production.

A good practice mentioned in Best practices for lifecycle management in Fabric - Microsoft Fabric | Microsoft Learn is to use different databases in each stage. That is, build separate databases for development and testing to protect production data and not overload the development database with the entire volume of production data.

For now, data pipelines are not supported to be managed by deployment parameter rules. You can learn more about Deployment Rules in Create deployment rules for Fabric's Application lifecycle management (ALM) - Microsoft Fabric | Mic....

However, you can edit the pipeline inside the Test Workspace to change the source (as long as it has the same data structure or file format), run the edited pipeline and refresh data to obtain the desired results. Proceed similarly with the deployed data pipeline inside the Production Workspace: edit it, run and refresh data.
Next figure shows data source and destination to be configured in a data pipeline.

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (15)

Summary

Fabric Data Pipelines serve as the backbone of data integration and orchestration. Source control is essential for managing Fabric’s data pipelines for several reasons, among the most significant are being able to access previous versions for reusability purposes or recovering from errors, being able to share code between developers and knowing the evolution of the data pipeline.

We have provided a step-by-step guide to include data pipelines into source control by means of Fabric-GIT integration, describing how to retrieve a specific data pipeline code from commit’s history, and updating the data pipeline inside Fabric.

Data Pipelines must be considered in the content to be shared in the Deployment Pipelines, due to the need to ensure data consistency and security from development stages to production.

You can find more information here:

Create your first Data Ingestion Pipeline in Microsoft Fabric | Microsoft Fabric Tutorials (youtube....

Microsoft Fabric Life Cycle Management ALM in Fabric by taik18 - YouTube

How to monitor pipeline runs - Microsoft Fabric | Microsoft Learn

Git integration and deployment for data pipelines - Microsoft Fabric | Microsoft Learn

Datasets - Refresh Dataset - REST API (Power BI Power BI REST APIs) | Microsoft Learn

Data Factory in Microsoft Fabric documentation - Microsoft Fabric | Microsoft Learn

Managing Fabric Data Pipelines: a step-by-step guide to source control and deployment (2024)

FAQs

Can you walk me through the steps of a deployment pipeline? ›

The deployment pipeline starts with the Commit stage, triggered by code commits to the version control system (VCS). In this stage, the code changes are fetched from the VCS, and the build server automatically compiles the code, running any pre-build tasks required.

What is a fabric pipeline? ›

Data Factory pipelines in Microsoft Fabric are used to orchestrate data ingestion and transformation tasks.

What are the 5 steps of deployment? ›

The deployment process flow consists of 5 steps: Planning, development, testing, deploying, and monitoring.

What are the 5 stages of a 5 stage pipeline? ›

The 5 stages being used are Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory (MEM) and Write Back (WB).

How to deploy an ETL pipeline? ›

Run ETL Pipeline Locally

So, set up a Python virtual environment and activate it. Then install dependencies. We only have google-cloud-bigquery in there but ideally you will have more dependencies. Run the main script.

What is the difference between ETL and data pipeline? ›

ETL is a specific data integration process that focuses on extracting, transforming, and loading data, whereas a data pipeline is a more comprehensive system for moving and processing data, which may include ETL as a part of it.

What is the CI CD lifecycle? ›

The CI/CD pipeline combines continuous integration, delivery and deployment into four major phases: source, build, test, and deploy.

What is pipeline steps? ›

There are four main stages of pipelining: Fetching: The processor retrieves an instruction from memory. Decoding: The processor converts the instruction into something it can understand. Executing: The processor carries out the instruction. Writing back: The results of the execution stage are saved in memory.

What is the difference between fabric dataflow and pipeline? ›

Data Factory Pipelines, Microsoft Fabric Data Pipelines, or just Data Pipelines are complements of the Dataflow. Data Pipelines are mechanisms where you can define a control flow of execution, whereas Dataflows are for data transformations. You can run one or more Dataflows inside a Pipeline.

How does fabric sourcing work? ›

Fabric sourcing is the process of finding a supplier who produces the fabric you need and managing the supply chain and delivery to get the required goods on time, within budget, and without any damage.

What are the 3 types of fabric production process? ›

It may be produced by a number of techniques, the most common of which are weaving, knitting, bonding, felting or tufting. Conventional fabrics (woven, knitted) are produced in such a way that the fibers are first converted into yarn and subsequently this yarn is converted into fabric.

What are the four phases of deployment? ›

These stages are comprised as follows: pre-deployment, deployment, sustainment, re-deployment and post-deployment.

What are the 4 phases of release and deployment management? ›

It includes four phases of change deploy and release management such as release and deployment planning, release building and testing, deployment and review and close deployment Presenting our set of slides with Four Phases Of Change Deploy And Release Management.

What are the four steps in a CI/CD pipeline? ›

The CI/CD pipeline combines continuous integration, delivery and deployment into four major phases: source, build, test, and deploy.

References

Top Articles
Latest Posts
Article information

Author: Arline Emard IV

Last Updated:

Views: 6554

Rating: 4.1 / 5 (72 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Arline Emard IV

Birthday: 1996-07-10

Address: 8912 Hintz Shore, West Louie, AZ 69363-0747

Phone: +13454700762376

Job: Administration Technician

Hobby: Paintball, Horseback riding, Cycling, Running, Macrame, Playing musical instruments, Soapmaking

Introduction: My name is Arline Emard IV, I am a cheerful, gorgeous, colorful, joyous, excited, super, inquisitive person who loves writing and wants to share my knowledge and understanding with you.