Spark environments with GeoAnalytics Engine

Many organizations use a unified data system or converged analytics platform to manage and work with enterprise data assets. This trend towards consolidation has introduced a deployment and architecture pattern that is focused on compute resources located next to data sources, often accessed through a data engineering and analysis environment built with Apache Spark. Similar to the Big data analytics system pattern, spatial and temporal big data analytic results are typically written back to data stores for further downstream analysis or for visualization and further geographic analysis.

As an integration pattern, the use of GeoAnalytics Engine allows existing systems to integrate the spatial functions and tools of GeoAnalytics Engine to into existing data processing pipelines or engineering workflows. Another common approach combines enterprise business data (stored in a system accessible through Spark) with geospatial features loaded from an ArcGIS dataset for reporting or analysis. GeoAnalytics Engine can read various data sources including CSVs, Parquet and GeoJSON, and write results back to ArcGIS feature services or data structures in a data lake or big data file system.

For additional resources, see:

Integration patterns in ArcGIS

ArcGIS GeoAnalytics engine includes documented deployment patterns for several specific technologies, each of which can read data from and write data back to ArcGIS Enterprise or ArcGIS Online feature services. The GeoAnalytics toolbox for ArCGIS Pro includes a subset of spatial functions and tools that can be used through desktop analysis workflows.

Capability ArcGIS Online ArcGIS Enterprise ArcGIS Location Platform ArcGIS Pro
ArcGIS GeoAnalytics Engine N/A

Full support

Partial support


Best practices

  • Use GeoAnalytics Engine tools when an appropriate tool exists. These tools have been designed and optimized to solve specific business problems and provide a good baseline for further development.
  • In general, Spark analysis is well-suited to Map/Reduce-style workloads, where very large datasets are distilled to specific results using spatial functions or other data engineering tools such as joins or summarization. Creating these smaller result datasets prior to
  • Most Spark-based analytics systems work within fixed compute resources, so it is important to test and tune queries and operations for effective analysis. Testing a query or a tool against a subset of data is recommended prior to running an analytic that may work across billions of data elements or rows.
Top