databricks devops

Azure Databricks 1 click deployment via DevOps

The following article will put together the end to end list of step for deploying Databricks Notebook via Devops Pipelines

The strategy: let’s say we have 2 environments DEV/MASTER (for clarity and simplicity sake), each of them represented by different Databricks Workspaces. We can sync the Notebook in DEV with DevOps, and it will be versioned. After the development is finished a DevOps pipeline will run taking the final Notebook and deploying it into the Master environment. (Below in the article it will be shown a potential branch strategy more in details)

Before going into DevOps pipeline creation it is necessary to sync the Databricks Notebooks with DevOps as exaplained here

There are 2 packages in the marketplace:
The Script Deployment Task and the Notebook Deployment one
Click on Script Deployment Task and after installing it we’ll see this screen:

Once installed tasks will be available in the list when creating a task within a job:

We need to have a build pipeline which can be built as follows:

Choose Azure Repos Git, select your Repo and press continue

Choose Empty Job

After having given a name, let’s create a new agent job click on the + button

Search “Publish Build”, which will retrieve the Databricks Notebooks from the repo and make them available for the release

Let s call the artifact “TakeFiles” and the “Save and Queue” the BuildPipeline

when running it it is possible to notice that the name will take the definition of the last Push that has been done:

It is now possible to create a release pipeline by clicking on the + symbol, New Release and choosing “Build” as artifact type and indicating the name of the Source Build Pipeline from the drop down menu:

Let’s create an empty Job stage of this pipeline:

Once it’s created let’s go into the stage and add the Databricks Task

Attention: if you have the 2 marketplace packages installed you could find also the package shown below (which has less properties, and it won’t be the one for this guide)

It’s now time to choose the right folder where to take the Notebook from, clicking on the 3 dots:

A pop up will appear and we are going to take the output of the build which has taken the notebook synced in devops from the Databricks dev environment

In this case the output is a python notebook. A Databricks archive notebook has the .dbc format, but when syncing the notebook with DevOps it will be a .py file with “###command” lines that indicates the new cell you would see within the Databricks UI.

Given the below structure of notebooks organization in the workspace:

We are going to indicate “master/” as the path where to publish the Notebook. Now it’s possible to create a release and run it (or set it to run automatically right after a build is completed successfully )

After having run it, it will be possible to monitor and see the outcome:

Now going to the path indicated within Databricks it will be possible to find the new Notebooks deployed.

Branch Strategy: A potential simple strategy to deal with versions/DatabricksWorkspaces/DevOps branches could be the following:

DatabricksDev workspace is synced with the Dev Branch in DevOps. Once the Develompent/Testing is finished, it can be promoted to the master branch. The build and Release pipeline will take care of the deployment of the new notebook into the production environment (Databricks Production Workspace)

4 thoughts on “Azure Databricks 1 click deployment via DevOps”

  1. Hi Luca,

    i have been trying to upload notebooks trying the method explained here, but failing.
    can you pls. share some light?

    thanks,
    Ravi
    Error:
    1st error
    “error_code”:”INVALID_PARAMETER_VALUE”,”message”:”Path (//Shared/Test/helloworld-5fd1c61fae851467f0091fae8d8f90d4896141e3.py) contains empty name”
    2nd error
    error_code”:”INVALID_PARAMETER_VALUE”,”message”:”Path (C:/Program Files/Git/Shared/Test/helloworld-5fd1c61fae851467f0091fae8d8f90d4896141e3.py) doesn\’t start with \’/\'”}’

    1. lucavallarelli

      Hello Ravi. How are you taking those notebook from?
      The path should be the one within DevOps
      In databricks environment you sync the notebook with devops
      devops will have the repo containing the file
      and the pipeline will read the notebooks from the directory within the Devops repo

    1. lucavallarelli

      Hello Mona, what’s the directory you’ve provided?
      notebooks need to be synced to DevOps, and then give the path where to find them to be deployed into a new environment

Leave a Comment

Your email address will not be published. Required fields are marked *