Real-time data streaming and analytics system
A real-time data streaming and analytics system allows organizations to ingest, visualize, and analyze data from real-time feeds, such as sensors, assets, and other dynamic data sources. Data can be used for real-time mapping, stored as spatiotemporal big data for further analysis, and leveraged to trigger actions and alerts based on occurring events. This pattern connects to live data feeds and other rapidly emitting data sources such as the Internet of Things (IoT). Feeds and other streaming data sources can be moving or stationary, analyzed in real-time, and disseminated to outbound systems and applications.
A real-time data streaming and analytics system pattern delivers value to an organization through various characteristics, such as:
- Integrating with and connecting to real-time data sources like vehicle locations, sensors, personnel, and other sources (including cloud-native services and queues) for improved real-time, geographically-informed decision making.
- Performing real-time analytics on high-velocity data, driving automated actions, and real-time push notifications and alerting when patterns of interest are detected.
- Storing observations of any frequency, from intermittent to highly frequent, for future visualizations and analyses.
If you’re new to ArcGIS system patterns, review the introduction first.
User personas and workflows
The user personas who most commonly interact with real-time data streaming and analytics systems, along with the types of workflows and tasks they typically perform using this system, include:
- Data analyst, scientist, and engineer. Data analysts, scientists, and engineers work with this system, including both real-time data feeds as well as historical, stored real-time data, to design, develop, and conduct analysis routines. When working with real-time data feeds, these personas often design analytic models that are published to the real-time data streaming and analytics system, resulting in new, derived outputs that may include new feeds or triggered alerts. When working with historical real-time data at rest, these user personas interact with this system much like they would a big data analytics system.
- GIS analysts. GIS analysts work with this system, often alongside data analysts, scientists, and engineers. GIS analysts often work hands on with the system, setting up feeds performing analytic workflows, and more, but may also play a supporting role to the data analyst, scientist, and engineer personas above to ensure that important spatial concepts are understood and that best practices for working with geospatial data and analysis methods and tools are applied.
- Data owners. Data owners are accountable for the feeds and datasets used in real-time data streaming and analytics systems. Feed data owners may be consulted as the feeds are incorporated into the system. When storing historical real-time data, data owners are typically involved in the design and oversight aspects of data management, including data modeling, quality control, and governance.
- Data stewards. Data stewards are typically responsible for day-to-day management of feeds and datasets used in real-time data streaming and analytics systems. Common workflows and activities may include loading and importing data, auditing data updates, as well as structuring and governing data management workflows.
To get the most value out of a real-time data streaming and analytics system, consider involving GIS analysts in addition to data analysts, scientists, and engineers, or individuals possessing skills from both personas.
Applications
Real-time data streaming and analytics systems expose APIs and tools, along with design, modeling, and management interfaces familiar to data analysts, scientists, and engineers. These interfaces differ between deployment patterns, but both deployment models include a web interface for the user personas described above to manage real-time data feeds, as well as design, publish, and conduct real-time data analytics.
ArcGIS provides a wide range of applications including dashboards, web application builders, native mobile applications, and desktop applications that can work with real-time feeds and data emitted from a real-time data streaming and analytics system. These applications are typically provided in combination with (or through integration with) a self-service mapping, analysis, and sharing system or other ArcGIS system patterns. Learn more about using, integrating, and composing system patterns.
This system also includes a portal website, which serves as a general web interface into ArcGIS systems. In real-time data streaming and analytics systems the portal website is used to manage users and content as well as administer system components. ArcGIS Pro, the desktop application designed for GIS professionals, is also commonly used for more advanced spatial and GIS analysis work.
Custom applications built with ArcGIS Maps SDKs, which include immersive experiences (XR), are also commonly used with real-time data streaming and analytics systems.
For more information on the full spectrum of applications provided by ArcGIS, see application architecture in the ArcGIS overview.
Capabilities
The primary capabilities provided by a real-time data streaming and analytics system are introduced below. Capabilities used in real-time data streaming and analytics workflows, but typically provided by other systems, such as basemaps and other location services provided by a location services system are not listed below. Learn more about related system patterns.
Not all capabilities described below are available in all deployment patterns. See selecting a deployment pattern and the deployment pattern pages for more information on which capabilities apply to various deployment contexts.
- Feed ingest connects the system to external sources of real-time, observational data such as Internet of Things (IoT), message brokers, and third-party APIs. These external sources are referred to as feeds and can be configured as input to the real-time streaming and analytics system. Supported feed types differ across deployment patterns.
- Data ingest enables data to be loaded into the system for batch analysis and processing. This system pattern does not support performing big data analytics directly at data source locations external to ArcGIS. All batch analysis and processing on stored data performed in the real-time data streaming and analytics system is done within the system. Learn more about big data analytics systems.
- Spatial joins and relationships enable rows from two feeds or datasets to be combined based on a spatial relationship. A variety of spatial relationships, including intersect, erase, union, identity, and symmetrical difference may be applied, though capabilities vary based on the selected deployment pattern.
- Pattern analysis identifies spatial and temporal patterns in data. This includes tools such as find hot spots, find similar locations, and various regression-based analysis methods for modeling trends and generating predictions. Pattern analysis is typically performed on big data, not real-time feeds.
- Proximity analysis looks at the proximity of spatial data to other spatial data. This includes tools such as create buffers and calculate distance.
- Summarization analysis aggregates or summarizes data into higher order data structures. This includes tools such as aggregate points, calculate density, and summarize within. Summarization analysis is typically performed on stored data, not real-time feeds.
- Track analysis works with time-enabled observations correlated to distinct entities, typically moving objects. This includes tools such as reconstruct tracks, snap to network, and tools to analyze journeys and dwell locations.
- Geofencing is a form of real-time spatial analysis in which features (often track points) are assessed using areas of interest (often polygon areas). Most commonly, point-based observations are analyzed to determine if they have entered or exited a virtual perimeter.
- Data management supports operating on geometries and other fields in real-time feeds and big data. This includes tools such as calculate field, project, and map fields.
- Custom input connectors allow for new input connectors to be developed using code. Options for custom input connectors differ substantially between deployment patterns; see selecting a deployment pattern for more information.
- Custom analysis tools allow for new analysis tools to be developed using code. Options for custom analysis tools differ substantially between deployment patterns; see selecting a deployment pattern for more information.
- Custom output connectors allow for new output connectors to be developed using code. Options for custom output connectors differ substantially between deployment patterns; see selecting a deployment pattern for more information.
- Mapping and visualization of real-time data and analytical output is a powerful step to provide context and to help uncover patterns, trends, and relationships in data. Visualizing and mapping is analogous to charting and plotting with non-spatial data; it enables analysts to verify their analysis, iterate, and create shareable and engaging results.
- Data publishing and hosting provides for secure storage, management, and access of data as a service for data ingested into the system or persisted from real-time feeds. Data hosted in the system is typically made available as feature services or feature layers.
- Feed publishing and hosting provides for new feeds to be published to and hosted by the system. Feeds hosted by the system are typically made available as stream services or stream layers.
- Send or store messages is an output of real-time analytics that sends or stores processed feed data (messages) to external systems including message brokers, object stores, and other messaging systems like email and SMS. Supported output types for sending and storing messages differ across deployment patterns.
- Sharing of analysis results is supported by ArcGIS but is considered outside of the scope of the real-time data streaming and analytics system. See related system patterns for more information.
Architecture considerations
Real-time data streaming and analytics systems are built using ArcGIS. This section describes in more detail how real-time data streaming and analytics systems align with and focus on specific aspects of the ArcGIS architecture.
For more detailed architecture considerations, see selecting a deployment pattern.
Data (persistence)
Real-time data streaming and analytics systems work with real-time feeds as well as big data sources.
Real-time feeds are rapidly emitting external sources of real-time, observational data, such as Internet of Things (IoT), message brokers, and third-party APIs. Real-time feeds can be ingested into the real-time data streaming and analytics system, allowing for real-time observational data to be processed, analyzed, and output from the system. Output types include object stores as well as feeds and other message brokers and systems. Supported feed and output types differ across deployment patterns.
The real-time data streaming and analytics system includes a managed NoSQL, document-based big data store for managing big data persisted from real-time feeds. Learn more about ArcGIS managed data stores.
Services (logic)
Real-time data streaming and analytics systems provide the real-time analytic services for ArcGIS and, in the SaaS deployment pattern specifically, big data analytic services.
The real-time data streaming and analytics system can also be used for querying, accessing, spatial referencing, enriching, and managing big data stored in the system. Using this system for extract, transform, and load (ETL) workflows is common as well. The real-time data streaming and analytics system makes use of interactive mapping with basemaps and reference layers for visualizing analysis results. The sharing of analysis results and other content through portal services is common, though this is typically accomplished through another ArcGIS system. See related system patterns for more information.
Applications (presentation)
The primary outputs of the real-time data streaming and analytics system are messages, feeds, and other services consumed by other systems and applications outside the scope of this system pattern. This system typically exposes only lower-level user interfaces familiar to data analysts, scientists, and engineers. These user interfaces, or applications, vary depending on the selected deployment pattern. See applications for more information.
Support
Real-time data streaming and analytics systems can vary in terms of support needs.
When used primarily for real-time data streaming, often in conjunction with real-time analysis, it is common for feeds and messages output from the system to be used in business or mission critical workflows. In these cases, systems should be designed and operated with high levels of reliability, security, observability, performance, and scalability in mind. Strong governance practices and standards can also help ensure that the real-time data streaming and analytics system is able to mature, expand, and evolve according to the needs of the business while at the same time adhering to nonfunctional, IT requirements.
When used primarily for batch analysis and processing of historical, observational data, this system may have less rigorous technical requirements related to areas like reliability. However, for these use cases, scalability as well as performance and security can be of concern for the system-managed big data store.
Integration with other systems can take the form of real-time feed and big data ingest into the real-time data streaming and analytics system. The outputs from real-time data streaming and analytics systems are also commonly integrated into other systems across an organization’s enterprise, and therefore may also support business operations that are unknown or unavailable to systems administrators. This typically demands use of metadata, monitoring, and service level agreements (SLAs).
For general support and architecture considerations, see architecture practices as well as the architecture pillars of the ArcGIS Well-Architected Framework.
Real-time data streaming and analytics systems may be integrated or combined with other ArcGIS system patterns. Some common examples include:
For more information on integrating or composing system patterns, see using system patterns.
Examples
Industry-specific system examples for this system pattern include:
- Natural resources. Petroleum and pipeline companies can use this pattern to integrate high-velocity data from SCADA systems into a real-time picture of their assets. They may also be interested in using this pattern to track vehicles, analyze mobile activities, monitor weather and the environment, and automatically trigger work orders based on inspection results.
- Utilities. Electric utilities can use this pattern to monitor hazards such as lightning strikes, floods, and wildfires and assess whether they threaten infrastructure. Gas utilities can use this pattern to monitor readings from smart meters, track vehicles and equipment, and analyze the activities and safety of mobile crews. Telecommunications providers can monitor antenna performance and review dropped call patterns to highlight areas where network improvements might be required.
- Water and wastewater. This pattern lets water utilities visualize and analyze streaming data from smart water meters and SCADA systems. With this data, they can monitor flow in real-time, detect leaks and identify their locations, and analyze consumption trends to identify water loss.
- Transportation. Transportation organizations can use this pattern to monitor and analyze automatic vehicle location (AVL) and transit asset data to track work area activity and route deviations for service vehicles. They can also track assets and equipment, detect traffic anomalies, and communicate weather conditions and parking availability. In addition, they can consume Automatic Identification System (AIS) feeds and other telemetry data from ships and aircraft to monitor compliance and optimize port and airport scheduling. Airports can analyze passenger movements in wait lines and terminals, and they can monitor aircraft locations on the ground to better coordinate fuel, luggage, and maintenance vehicles.
- Public safety. Public safety organizations can use this pattern to track and archive the locations of personnel, identify vehicles, and trace their locations using license plates, detect gunshots using data from ShotSpotter and similar APIs, and monitor potential security threats. Public safety organizations can also use this pattern for incident detection and monitoring of critical infrastructure.
- Local government. Smart cities are interested in using data from connected sensors, vehicles, meters, and other devices to track public works assets, monitor police activities, and make data-driven decisions. With this pattern, they can track feeds across the city or county, dispatch staff where they are needed, and automate the behavior of systems and right-of-way assets.
- National government. Like local governments, national government agencies can use this pattern to monitor and manage infrastructure, assets, personnel, and facilities. They can also synthesize data feeds related to agriculture, weather, vessel locations, disasters, seismic activity, air quality, and other phenomena, then apply that data in research, forecasts, and decisions.