Enterprise systems are frequently complex and multi-functional. In order to run at a high level of availability or service, and to ensure that any issues can be effectively dealt with, they must be well-understood and transparently observable for engineering and IT staff. To effectively manage and operate these enterprise systems, the staff and responsible teams must have access to information about how the system is running, the operational status of the system, applications, and supporting services. Observability is a commonly used IT term to describe the availability of information about the internal workings of a system, such as the compute usage of a database cluster, the storage input/output metrics for a provider, or the number of requests and activity across different apps or services.
The importance of observability supports two related goals of system operations - that the system maintains a steady state that is consistent and stable, and that when issues appear, that supporting teams can quickly respond to the issues and make informed, effective changes to restore the system to an expected state. Observability enables this by providing ready access to consistent, detailed information about system operations, so that any different reporting can be identified, assessed, and acted upon by the supporting teams.
Telemetry, monitoring, and observability are related concepts, but they have distinct meanings. You can think of telemetry as being a subset of monitoring, which is a subset of observability:
Telemetry is the collection of information about a system’s state. Well-architected systems define and capture telemetry for ArcGIS components. These systems ensure that applications, workflows, and custom components report regularly on status, log activities in a consistent way, or provide the API methods to allow an external process to query and monitor status and progress.
Monitoring solutions extend telemetry by not only collecting information about a system’s state, but visualizing that information and comparing it to predefined thresholds. Monitoring solutions typically use alerts and dashboards to provide awareness of the system’s state and reveal performance issues or abnormalities. Because these alerts and dashboards must be predefined, monitoring is most useful for problems that can be predicted or anticipated.
A fully observable system extends monitoring by providing tools to analyze information about the system. In addition to predefined alerts and dashboards to describe what is happening with the system, observability helps you understand why it is happening. These additional capabilities enable you deal with more complex and unpredictable issues than is possible with monitoring alone.
Many ArcGIS software components include tools or patterns that support observability, including logging, direct monitoring, and reporting on status, usage or performance. ArcGIS Online usage reporting allows administrative users to regularly report on usage and activities. ArcGIS Pro includes both a Diagnostic Monitor and a Performance Assessment Tool for assisting with troubleshooting and monitoring of workflows. ArcGIS Enterprise observability features include service and system metrics, extensive logging, and the ability to integrate with third-party observability tools like Prometheus.
Identifier | Best Practice |
---|---|
O.1 | Ensure that your supporting system engineers are familiar with existing observability and monitoring options across ArcGIS software products. |
O.2 | Start by monitoring key workflows rather than all system operations, to focus on impact and value. Be sure that the noise of monitoring is not so loud that important signals are missed. |
O.3 | Integrate with existing organizational observability investments, such as an existing monitoring tool or reporting framework, to encourage collaboration and build on previous successes. |