Following best practices for observability will result in a system that reports on its own health, provides relevant metrics to operators, and can be efficiently supported during troubleshooting or error mitigation activities.
Identifier | Best practice |
---|---|
O.1 | Follow existing observability patterns in ArcGIS software. Ensure that supporting system engineers are familiar with existing observability and monitoring options across ArcGIS products and can use these tools effectively in case of a workflow failure or system outage. |
O.2 | Monitor key workflows first, rather than all system operations, to focus on impact and value. Be sure that the noise of monitoring is not so loud that important signals are missed. |
O.3 | Alert to inform action, rather than sending a large volume of alerts which will quickly be ignored. |
O.4 | Integrate with existing organizational observability tools, such as an existing monitoring software package or reporting framework, to encourage collaboration and build on previous successes. |
O.5 | Create runbooks and technical documentation to help a system operator triage an issue or search for a root cause without having full understanding of the architecture. |
O.5 | Define and exercise escalation paths so that issues in a real world outage are quickly sent to the appropriate team and everyone is informed. |