Design choices and considerations

The following considerations are organized around the architecture pillars of the ArcGIS Well-Architected Framework. The appropriate application of best practices and architectural approaches in each of these technical areas contributes significantly to the successful design and implementation of well-architected systems.

Performance and scalability

Workload separation

The choice to design for workload separation was made to help achieve an optimal distribution of compute resources across the system. In the test study, editing requests generally took longer to process than standard map requests, so the choice to isolate editing workloads with dedicated compute resources in the form of a separate ArcGIS GIS Server site was made. Additionally, isolating the system components themselves onto different machines helps to ensure they don’t compete for system resources and allows for the opportunity to tailor machine types and sizes to the system requirements of each component.

GPU-Enabled desktop machines

Selecting the proper GPU (Graphics Processing Unit) is essential for ensuring the performance of ArcGIS Pro in a virtualized environment. Tests revealed that adding a dedicated GPU to ArcGIS Pro virtual machines significantly improved end-user productivity and produced a net reduction in cost when operational expenses (labor costs) are considered. Learn more about GPU hardware selection and ArcGIS Pro virtualization in the ArcGIS Architecture Center.

Watching for vCPU: CPU in the cloud

It’s important to understand the ratio of Virtual CPU (vCPU) to physical CPU when making design decisions so system components may be assigned appropriate resources. There is a 2:1 ratio of vCPU:CPU for all machines in the diagram, but some virtualization options may have different ratios, such as 1:1. These decisions may also have Esri licensing implications. Some examples of public cloud ratios include AWS, Azure, and GCP.

GIS services configuration

Proper configuration of GIS services is critical to system performance and user experience satisfaction, and the misconfiguration of GIS service instances can introduce problems or reliability challenges in a system. For example, if the number of instances for a map or feature service are set too low, it can result in long client wait times and timeout errors.

Setting the instance count too high, however, can consume excessive machine resources, limiting the number of services that can be deployed on a fixed hardware configuration. When the maximum instance setting is higher than the minimum, the system can automatically add new instances in response to demand, but this can also be problematic because incoming requests must wait for the instance to start. For any system, it is important to understand service usage so that instance numbers and server resources can be adjusted to provide optimum performance.

In this test study, the ratio of service instances to physical CPU cores was set to 2:1 for each relevant service, with the minimum and maximum instances settings configured at that same value. The instance usage was monitored to determine when the system was overloaded. For example, at 8x design load, the service instances for a service on the hosting server were observed as active for 99% of the test period, which led to high wait times for read-only services. The services in this test were configured for dedicated instances. Learn more about configuring service instance settings.

In this test study, the utility network services were configured as follows:

  • Minimum number of instances per service: 8
  • Maximum number of instances per service: 8

The total number of available instances was 16 because there were two ArcGIS GIS Servers in the site. The hosting servers were configured as follows:

  • Minimum number of instances per service: 6
  • Maximum number of instances per service: 6

The total number of available instances was 12 because there were two ArcGIS GIS Servers in the site.

The specified service timeouts were configured as follows:

  • Maximum time a client can use a service : 600 seconds
  • Maximum time a client will wait to get a service : 600 seconds
  • Maximum time an idle instance can be kept running: 1800 seconds

Reliability

Backups

Backups are critical for Network Management Systems. Refer to the reference architecture for more information. While the tested design was not a production system, machine snapshots and database backups were captured for each test run and before making any changes to the system. Virtual machine snapshots were taken before and after any change in the environment (such as resizing a machine, installing a patch, or updating Windows). Snapshots were then cataloged to enable either:

  • Roll back of a specific machine to a specific point in time
  • Roll back of the entire environment to a specific point in time

High availability

The choice to design this system with a high availability configuration of ArcGIS Enterprise components was made based on business and technical system requirement, along with other organizational goals such as achieving uninterrupted operations and minimizing downtime. This configuration is illustrated in the design with redundant system components and a cloud-native, highly available file store for file storage. This test study did not configure a highly available database for testing purposes, though relational database vendors have a variety of methods to approach high availability including cloud-native services.

Note:

Keep in mind that high availability configurations can significantly increase infrastructure and operational costs of the system, and requires specialized skills to be successful. Learn more about design choices and considerations with regard to high availability for a Network Management System.

Observability

To perform successful system validation and deliver meaningful results, system monitoring and telemetry capture were key aspects to the test study.

ArcGIS Monitor and enterprise IT monitoring tools like Windows Performance Monitor were used to monitor the system’s performance and capture telemetry on its behavior under certain conditions. Logs were collected across different system components, including:

  • IIS web server
  • ArcGIS software components
  • Windows Events
  • ArcGIS Pro

Machine-level metrics such as CPU usage, RAM consumption, disk activity, and network activity were captured across all machines in the environment. Review the test results for more information.

Additionally, screen recordings were captured of conducted workflows to observe and assess end-user experience and productivity.

Automation

Because the scope of the test study was primarily focused on load testing, most types of automation that would be recommended for a production system (like scripting administrative tasks) were not employed. However, in your environment, administrative scripts can have significant value to workflows and operations. Any automation scripting should be tested in a lower environment before deploying to production.

In this test study, the primary application of automation was for the purpose of simulating requests during load tests. Multiple workflows were run with virtual users at scale with the ability to apply to different load sizes, as illustrated in the test results.

Python scripts were used to perform analysis on and identify patterns in service wait times, ArcSOC utilization, response times, and failed requests to inform needed system changes. Python, PowerShell and SQL scripts were also used to restore the database to an original state after completing a load test.

Security

While security was not the focus of the test study, it is critical to consider security requirements early in the design process for any production system. ArcGIS software has been designed to work effectively within secure networks, including those that are fully disconnected from the internet. The test study design does include the use of an identity provider to provide proper authentication and authorization.

Related resources:

Integration

While integrations were not within the scope of the test study, a Network Management System often requires integration with other enterprise systems like a Enterprise Asset Management (EAM), Customer Relationship Management (CRM) and Advanced Distribution Management (ADMS) systems. In addition to standard integration considerations with ArcGIS, the ArcGIS Utility Network capability has additional requirements to consider. Depending on the integration requirements, different APIs and/or SDKs may be supported. See Journey to the Utility Network: Integrations Overview for more information.

Top