Real-time data streaming and analytics system (Windows/Linux)
The real-time data streaming and analytics system pattern is typically deployed to Windows and Linux using the ArcGIS Enterprise for Windows and Linux software along with ArcGIS GeoEvent Server.
ArcGIS Enterprise for Windows and Linux includes several components that span the data, services/logic, and presentation tiers, and work together to provide a complete system. ArcGIS Enterprise for Windows and Linux is fully supported on virtual environments (running a supported operating system), as well as cloud providers running virtual machines that meet the system requirements. Esri also provides deployment tooling for cloud platforms including Amazon Web Services (AWS) and Microsoft Azure.
ArcGIS GeoEvent Server enables real-time event-based data streams to be integrated as data sources in your enterprise GIS. Event data can be filtered, processed, and sent to multiple destinations, allowing you to connect with virtually any type of streaming data and automatically alert personnel when specified conditions occur, all in real time. ArcGIS GeoEvent Server has the same operating system requirements as the ArcGIS Server software component in ArcGIS Enterprise.
Related resources:
Base architecture
The following is a typical base architecture for a real-time data streaming and analytics system deployed on Windows or Linux.
This diagram should not be taken as is and used as the design for your system. There are many important factors and design choices that should be considered when designing your system. Review the using system patterns topic for more information. Additionally, the diagram depicted below delivers only the base capabilities of the system; additional system components may be required when delivering extended capabilities.
The capabilities represented above reflect those available as of November, 2024.
Key components of this architecture include:
- A base deployment of ArcGIS Enterprise, including the ArcGIS Data Store, ArcGIS Server, and Portal for ArcGIS. The ArcGIS Web Adaptor component of ArcGIS Enterprise is also recommended and may be required in some situations. The base deployment enables data publishing and hosting through hosted feature, vector tile, map tile, and scene services.
- ArcGIS Server with the ArcGIS GeoEvent Server software and role. ArcGIS GeoEvent Server delivers real-time capabilities to ArcGIS Enterprise. ArcGIS GeoEvent Server is depicted as a logically distinct component of this system from the ArcGIS Server that provides hosted and utility services (and that completes the base deployment described above). This is because they play two different roles in the system and are often designed and deployed separately at a physical level. The ArcGIS Web Adaptor component of ArcGIS Enterprise is also recommended and may be required in some situations. Learn more about ArcGIS GeoEvent Server components, server roles, capabilities, and licensing.
- Two deployments of the ArcGIS Data Store are common to real-time data streaming and analytics systems on Windows or Linux. One deployment of the ArcGIS Data Store handles relational and tile storage for hosted feature and scene services published to an ArcGIS Enterprise base deployment. The other deployment of ArcGIS Data Store serves as the spatiotemporal big data store, providing enhanced storage of large amounts of observational data. Both data stores provide ArcGIS-managed data storage.
- ArcGIS Online, Esri’s SaaS infrastructure, typically provides basemaps (for example, an imagery basemap), reference data (for example, places), as well as other location services (for example, geocoding and search) for this system. Alternatively, it is possible for the organization to host and manage their own location services instead of using Esri’s SaaS system. See the location services system pattern for more information.
- ArcGIS GeoEvent Server includes a comprehensive website for managing the real-time data streaming and analytics system as well as designing and running real-time analytics. The website is called ArcGIS GeoEvent Manager. ArcGIS GeoEvent Server exposes tools and APIs and is typically consumed by a wide range of applications and systems. Learn more about the applications used in a real-time data streaming and analytics system.
Key interactions in this architecture include:
- Client applications communicate with enterprise data services as well as location services over HTTPS, typically via stateless REST APIs.
- ArcGIS Server maintains persistent TCP connections to the ArcGIS Data Store deployment, providing managed storage for relational and tile data.
- ArcGIS Server communicates with the ArcGIS Data Store deployment, providing spatiotemporal big data storage using HTTP and TCP.
- ArcGIS GeoEvent Server ingests data from real-time, streaming sources through input connectors. ArcGIS GeoEvent Server works with wide variety of ArcGIS, web and messaging, cloud, as well as data provider sources. Learn more about input connectors in ArcGIS GeoEvent Server.
- ArcGIS Monitor, recommended for monitoring and optimizing the GIS system components, communicates with a variety of ArcGIS and IT (for example, DBMS) components using a variety of mechanisms. See ArcGIS Monitor documentation for more information.
- References to location services hosted and managed by ArcGIS Online (for example, basemaps) are typically registered and made available for use in ArcGIS Enterprise. Some services are referenced automatically when installing ArcGIS Enterprise, though additional sharing of content and services between these two systems can be performed manually or automatically. See configure ArcGIS Online utility services, configure ArcGIS Living Atlas content, and distributed collaboration for more information.
Additional information on interactions between ArcGIS Enterprise components can be found in the ArcGIS Enterprise on Windows and Linux product documentation, including a diagram of ports used in an ArcGIS Enterprise on Windows and Linux deployment. Additional deployment considerations can be found in the ArcGIS GeoEvent Server product documentation.
Capabilities
The capabilities of the real-time data streaming and analytics system on Windows and Linux are described below. See the capability overview the comparison of capability support across deployment patterns for more information.
Capabilities used in a real-time data streaming and analytics system, but typically provided by other systems, such as basemaps, geocoding, and other location services provided by a location services system are not listed below. Learn more about related system patterns.
Base capabilities
Base capabilities represent the most common capabilities delivered by self-service mapping, analysis, and sharing systems and that are enabled by the base architecture presented above.
- Feed ingest connects the system to external sources of real-time, observational data such as Internet of Things (IoT), message brokers, and third-party APIs. These external sources are referred to as feeds and can be configured as input to the real-time streaming and analytics system. ArcGIS GeoEvent Server supports multiple input connectors, including ArcGIS, cloud, web, messaging, and data provider feeds. Learn more about input connectors in ArcGIS GeoEvent Server.
- Data ingest enables data to be loaded into the system for batch analysis and processing. The primary use of data ingest in ArcGIS GeoEvent Server is to store real-time data in a spatiotemporal big data store. Batch analysis and processing of historical, stored observational data is typically performed by a big data analytics system, which is considered outside the scope of this system pattern. Learn more about the big data analytics system pattern as well as spatiotemporal big data stores in ArcGIS Enterprise.
- Spatial joins and relationships enable rows from two feeds or datasets to be combined based on a spatial relationship. A variety of spatial relationships, including intersect, erase, union, identity, and symmetrical difference may be applied. Spatial joins and relationships can be used in spatial filters as well as processors including, but not limited to the event joiner processor and the intersector processor. Learn more about filters and processors in ArcGIS GeoEvent Server.
- Pattern analysis identifies spatial and temporal patterns in data. Pattern analysis is typically performed via batch analysis or processing against big data, which is considered outside the scope of the real-time data streaming and analytics system pattern. However, ArcGIS GeoEvent Server does provide some limited pattern analysis capabilities through filters as well as some processors like the incident detector processor. Learn more about filters and processors in ArcGIS GeoEvent Server, as well as the big data analytics system pattern.
- Proximity analysis looks at the proximity of spatial data to other spatial data. Processors that perform proximity analysis include, but are not limited to, the buffer creator processor and the range fan calculator processor. Learn more about filters and processors in ArcGIS GeoEvent Server.
- Track analysis works with time-enabled points correlated to moving objects. Track analysis is typically performed via batch analysis or processing against big data, which is considered outside the scope of the real-time data streaming and analytics system pattern. However, ArcGIS GeoEvent Server does provide some limited track analysis capabilities through processors like the track gap detector processor. Learn more about filters and processors in ArcGIS GeoEvent Server, as well as the big data analytics system pattern.
- Geofence analysis is a form of real-time spatial analysis in which features (often track points) are assessed using areas of interest (often polygon areas). Most commonly, point-based observations are analyzed to determine if they have entered or exited a virtual perimeter. ArcGIS GeoEvent Server supports geofencing in several of the processors as well as geofence analysis through spatial filters.
- Data management supports operating on geometries and other fields in real-time feeds and big data. Processors that perform data management include, but are not limited to, the field calculator processor and the field mapper processor. Learn more about filters and processors in ArcGIS GeoEvent Server.
- Mapping and visualization of analysis results is a powerful step to provide context and to help uncover patterns, trends, and relationships in data. Visualizing and mapping is analogous to charting and plotting with non-spatial data; it enables analysts to verify their analysis, iterate, and create shareable and engaging results.
- Data publishing and hosting provides for secure storage, management, and access of data as a service for data ingested into the system or persisted from real-time feeds. Data is typically hosted in the system using either the relational store or the spatiotemporal big data store. Datasets are typically published as feature and map services for access by users and applications.
- Feed publishing and hosting provides for new feeds to be published to and hosted by the system. Feeds hosted by the system are typically published as stream services for access by users and applications.
- Send and store messages is an output of real-time streaming and analytics that sends or stores processed feed data (messages) to external systems including message brokers, object stores, and other messaging systems like email and SMS. Learn more about output connectors in ArcGIS GeoEvent Server.
Extended capabilities
Extended capabilities are typically added to meet specific needs or support industry specific data models and solutions and may require additional software components or architectural considerations.
Considerations
The considerations below apply the pillars of the ArcGIS Well-Architected Framework to the real-time data streaming and analytics system pattern on Windows and Linux. The information presented here is not meant to be exhaustive, but rather highlights key considerations for designing and implementing this specific combination of system and deployment pattern. Learn more about the architecture pillars of the ArcGIS Well-Architected Framework.
Reliability
Reliability ensures your system provides the level of service required by the business, as well as your customers and stakeholders. For more information, see the reliability pillar overview.
- Lower levels of availability are common. High availability SLAs can be achieved, though the real-time nature of incoming data via feeds makes architecting for high availability more complex than with most other system patterns.
- ArcGIS GeoEvent Server has unique characteristics and considerations related to reliability. Learn more about strategies for scalability, reliability, and resiliency as well as other deployment considerations with ArcGIS GeoEvent Server.
- This system and deployment pattern does not inherently guarantee delivery of messages. Special consideration needs to be given when processing messages of a critical nature to guard against unintentional dropping of messages.
Security
Security protects your systems and information. For more information, see the security pillar overview.
- Authentication and authorization are required for designing and running analytics, as well as managing the real-time data streaming and analytics system. It is also common for outputs like ArcGIS feature and stream layers to be secured, requiring authentication and authorization for access.
- User access and data collaboration are governed by role-based access controls and modern authorization and authentication models including OAuth, SAML, and multi-factor authentication.
Explore the ArcGIS Enterprise Hardening Guide to learn about strategies and associated settings that can be implemented to improve the security posture of ArcGIS Enterprise deployments.
Performance and scalability aim to optimize the overall experience users have with the system, as well as ensure the system scales to meet evolving workload demands. For more information, see the performance and scalability pillar overview.
- SLAs requiring high performance are common.
- ArcGIS GeoEvent Server has unique characteristics and considerations related to performance and scalability. Learn more about best practices for system architecture, machine resource allocation, strategies for scalability, reliability, and resiliency, as well as other deployment considerations with ArcGIS GeoEvent Server.
- Factors that tend to impact performance and scalability include:
- Message size and velocity
- Complexity of real-time analytics
- Number of geofences
- Bandwidth of output (including storage)
- Scalability of real-time data streaming and analytic systems on Windows/Linux has special considerations.
- Consider planning for peak capacity demand.
- Vertical scaling tends to be easier and more commonly implemented than horizontal scaling.
- Stateful analytics add additional complexity to horizontal scaling.
- Auto-scaling is not common with this system pattern.
- There are practical limits to scalability.
Automation
Automation aims to reduce effort spent on manual deployment and operational tasks, leading to increased operational efficiency as well as reduction in human introduced system anomalies. For more information, see the automation pillar overview.
Integration
Integration connects this system with other systems for delivering enterprise services and amplifying organizational productivity. For more information, see the integration pillar overview.
- Integration with other systems can take the form of real-time feed and big data ingest into the real-time data streaming and analytics system. The outputs from real-time data streaming and analytics systems are also commonly integrated into other systems across an organization’s enterprise, and therefore may also support business operations that are unknown or unavailable to systems administrators.
Observability
Observability provides visibility into the system, enabling operations staff and other technical roles to keep the system running in a healthy, steady state. For more information see the observability pillar overview.
- Real-time data, which is typically moving at high velocity, comes with some unique observability considerations. This is especially true when the velocity and/or stability of incoming feeds is inconsistent.
- The delivery of real-time services to the whole organization (and possibly beyond) may lead to usage patterns and growth not anticipated by the system designers or operators. Monitoring helps people make decisions about when to scale and evolve to meet demand while continuing to operate properly (and in accordance with SLAs).
- ArcGIS Enterprise on Windows/Linux components, including ArcGIS GeoEvent Server, can be observed in a variety of ways including server logs and server statistics. Monitoring of system availability, performance, and usage is most critical to this system pattern. In addition to monitoring the ArcGIS Enterprise software, it is important to monitor all supporting components and infrastructure such as the Windows or Linux operating system, databases and other data stores, as well as compute, network, security, and other infrastructure. Learn more about monitoring system health and reliability.
- Additional observation of user logins and account changes may be possible through the configured identity provider when using SAML and/or OpenID Connect logins.
Other
Additional considerations for designing and implementing a real-time data streaming and analytics system on Windows and Linux include:
- Successful operation requires strong understanding of GIS and IT concepts as well as technology.
- Data governance and alignment with IT policies and roles should strongly be considered when implementing this system pattern.
Related resources: