The following considerations are organized around the architecture pillars of the ArcGIS Well-Architected Framework. The appropriate application of best practices and architectural approaches in each of these technical areas contributes significantly to the successful design and implementation of well-architected systems.
The choice to design for workload separation was made to help achieve an optimal distribution of compute resources across the system. In the test study, editing requests generally took longer to process than standard map requests, so the choice to isolate editing workloads with dedicated compute resources in the form of a separate ArcGIS GIS Server site was made. Additionally, isolating the system components themselves onto different machines helps to ensure they don’t compete for system resources and allows for the opportunity to tailor machine types and sizes to the system requirements of each component.
Selecting the proper GPU (Graphics Processing Unit) is essential for ensuring the performance of ArcGIS Pro in a virtualized environment. Tests revealed that adding a dedicated GPU to ArcGIS Pro virtual machines significantly improved end-user productivity and produced a net reduction in cost when operational expenses (labor costs) are considered. Learn more about GPU hardware selection and ArcGIS Pro virtualization in the ArcGIS Architecture Center.
It’s important to understand the ratio of Virtual CPU (vCPU) to physical CPU when making design decisions so system components may be assigned appropriate resources. There is a 2:1 ratio of vCPU:CPU for all machines in the diagram, but some virtualization options may have different ratios, such as 1:1. These decisions may also have Esri licensing implications. Some examples of public cloud ratios include AWS, Azure, and GCP.
Proper configuration of GIS services is critical to system performance and user experience satisfaction, and the misconfiguration of GIS service instances can introduce problems or reliability challenges in a system. For example, if the number of instances for a map or feature service are set too low, it can result in long client wait times and timeout errors.
Setting the instance count too high, however, can consume excessive machine resources, limiting the number of services that can be deployed on a fixed hardware configuration. When the maximum instance setting is higher than the minimum, the system can automatically add new instances in response to demand, but this can also be problematic because incoming requests must wait for the instance to start. For any system, it is important to understand service usage so that instance numbers and server resources can be adjusted to provide optimum performance.
In this test study, the ratio of service instances to physical CPU cores was set to 2:1 for each relevant service, with the minimum and maximum instances settings configured at that same value. The instance usage was monitored to determine when the system was overloaded. For example, at 8x design load, the service instances for a service on the hosting server were observed as active for 99% of the test period, which led to high wait times for read-only services. The services in this test were configured for dedicated instances. Learn more about configuring service instance settings.
In this test study, the utility network services were configured as follows:
The total number of available instances was 16 because there were two ArcGIS GIS Servers in the site. The hosting servers were configured as follows:
The total number of available instances was 12 because there were two ArcGIS GIS Servers in the site.
The specified service timeouts were configured as follows:
Backups are critical for Network Information Management Systems. Refer to the reference architecture for more information. While the tested design was not a production system, machine snapshots and database backups were captured for each test run and before making any changes to the system. Virtual machine snapshots were taken before and after any change in the environment (such as resizing a machine, installing a patch, or updating Windows). Snapshots were then cataloged to enable either:
The choice to design this system with a high availability configuration of ArcGIS Enterprise components was made based on business and technical system requirement, along with other organizational goals such as achieving uninterrupted operations and minimizing downtime. This configuration is illustrated in the design with redundant system components and a cloud-native, highly available file store for file storage. This test study did not configure a highly available database for testing purposes, though relational database vendors have a variety of methods to approach high availability including cloud-native services.
Keep in mind that high availability configurations can significantly increase infrastructure and operational costs of the system, and requires specialized skills to be successful. Learn more about design choices and considerations with regard to high availability for a Network Information Management System.
To perform successful system validation and deliver meaningful results, system monitoring and telemetry capture were key aspects to the test study.
ArcGIS Monitor and enterprise IT monitoring tools like Windows Performance Monitor were used to monitor the system’s performance and capture telemetry on its behavior under certain conditions. Logs were collected across different system components, including:
Machine-level metrics such as CPU usage, RAM consumption, disk activity, and network activity were captured across all machines in the environment. Review the test results for more information.
Additionally, screen recordings were captured of conducted workflows to observe and assess end-user experience and productivity.
Because the scope of the test study was primarily focused on load testing, most types of automation that would be recommended for a production system (like scripting administrative tasks) were not employed. However, in your environment, administrative scripts can have significant value to workflows and operations. Any automation scripting should be tested in a lower environment before deploying to production.
In this test study, the primary application of automation was for the purpose of simulating requests during load tests. Multiple workflows were run with virtual users at scale with the ability to apply to different load sizes, as illustrated in the test results.
Python scripts were used to perform analysis on and identify patterns in service wait times, ArcSOC utilization, response times, and failed requests to inform needed system changes. Python, PowerShell and SQL scripts were also used to restore the database to an original state after completing a load test.
While security was not the focus of the test study, it is critical to consider security requirements early in the design process for any production system. ArcGIS software has been designed to work effectively within secure networks, including those that are fully disconnected from the internet. The test study design does include the use of an identity provider to provide proper authentication and authorization.
Related resources:
While integrations were not within the scope of the test study, a Network Information Management System often requires integration with other enterprise systems like a Enterprise Asset Management (EAM), Customer Relationship Management (CRM) and Advanced Distribution Management (ADMS) systems. In addition to standard integration considerations with ArcGIS, the ArcGIS Utility Network capability has additional requirements to consider. Depending on the integration requirements, different APIs and/or SDKs may be supported. See Journey to the Utility Network: Integrations Overview for more information.