Databricks and ArcGIS integrations

The Databricks Data Intelligence Platform provides a unified set of tools for building, deploying, sharing, and maintaining enterprise-grade data solutions at scale. Databricks integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf. The Databricks Runtime releases include both open source technologies like Apache Spark, as well as a number of proprietary tools that integrate and expand these technologies to add optimized performance and ease of use.

Many organizations use Databricks and ArcGIS for both independent and integrated workflows, especially focused on data engineering, data management, analytics, machine learning and AI model preparation. One common implementation pattern that combines ArcGIS and Databricks functionality is described in the Big data analytics system (Apache Spark) system pattern.

ArcGIS GeoAnalytics Engine in Databricks

ArcGIS GeoAnalytics Engine can be installed on Databricks in Azure, AWS, or Google Cloud platform to add spatial data science and analysis capabilities to your Databricks workspace. After installing GeoAnalytics Engine, you will be able to run spatial SQL functions and analysis tools using a Spark cluster managed by Databricks. Because GeoAnalytics Engine extends PySpark, you can spatially-enable your data wherever it lives and seamlessly execute spatial analysis workflows alongside other data science and machine learning technologies in a Databricks notebook.

This integration pattern is particularly useful for bringing spatial analysis and geographic tools to existing data stored in Databricks, as the scalability of a Spark-based analysis workflow can effectively process billions of records. Results of this process can be persisted back to Databricks storage or published to ArcGIS Online or ArcGIS Enterprise as hosted feature layers, allowing other GIS applications to interact with the results. In most cases Databricks is best used as an analytical system, that combines large datasets with spatial queries to generate a more tailored result. This avoids unnecessary data duplication and builds on the batch analytic capabilities of Databricks.

ArcGIS API for Python in Databricks Notebooks

The ArcGIS API for Python can be used within Databricks notebooks to access, manage, and analyze GIS content hosted in ArcGIS Online or ArcGIS Enterprise. Users can authenticate securely, query feature layers, perform spatial analysis, and visualize results—all within the familiar Databricks environment. This approach is ideal for teams looking to combine ArcGIS’s web GIS capabilities with Databricks’ collaborative data science workflows, especially when integrating spatial data with machine learning models or orchestrating ETL pipelines.

See the blog post Use the ArcGIS API for Python in Databricks Notebooks for additional details.

ArcGIS Data Pipelines

ArcGIS Data Pipelines provides a low-code interface for building spatial ETL workflows that can connect directly to Databricks. By configuring pipelines to read from Delta Lake tables, IT teams can automate the movement and transformation of data with minimal scripting.

This integration supports enterprise-grade data engineering use cases, such as synchronizing authoritative business datasets from big data platforms, enriching tabular data with spatial context, and preparing geospatial inputs for AI/ML models—all while maintaining governance and scalability. This functionality is currently in Beta and is described further in the ArcGIS Data Pipelines documentation.

Top