Selecting a deployment pattern for big data analytics systems

Big data analytics systems are typically deployed using the following two deployment patterns:

Selecting an appropriate deployment pattern depends primarily on the source of the data being used by the big data analytics.

The Apache Spark deployment pattern utilizes ArcGIS in the form of Apache Spark libraries to perform analytics on persisted data from centralized locations (for example data lakes, object storage, relational databases, files) accessible to the Apache Spark environment. The Apache Spark deployment pattern is the most common pattern for data scientists conducting spatial big data analytics. The foundational basis of the Apache Spark deployment pattern is to bring spatial analytics to the environment that the data scientist is familiar with. This is accomplished using a Spark library that allows data scientists to add spatial functions and processes to new or existing analytic workflows. Apache Spark provides distributed compute capabilities that supports access to a broad range of datasets, a robust library set of capabilities, the ability to explore and interact with structured analytics, and the ability to produce results that can be utilized by a stakeholder or downstream business process. 

Alternatively, big data analytics can be conducted as part of the SaaS deployment pattern for Real-time data streaming and analytics. In that pattern, real-time sensor or event data is ingested into the SaaS offering, analyzed in real-time or archived for later use by a user defined big data analytics process. 

There are many functional and non-functional differences between the Apache Spark and SaaS deployment patterns, such as the interface for designing analysis models and the specific analytic tools and capabilities provided. However, in addition to organizational preference on deployment models, the key decision point tends to be whether the big data analytics system will be primarily used for analyzing real-time data and observations or whether the system will be used to analyze data persisted in existing big data stores within the organization (for example data lakes). For more information, see the Apache Spark and SaaS deployment pattern pages.

For general information and considerations around these deployment approaches, see the ArcGIS products and deployment options page of the ArcGIS overview.