Data pipelines and ETLs

Most organizations store important data in multiple different systems, and data connection tools like pipelines and ETLs are critical to moving data between different systems so it can be combined with and analyzed against other sources or used to keep other relevant data up to date. Many different technical solutions can be used for moving data, from manual solutions where data is copied on demand to automated systems that perform complicated data transformations.

Use of the term “integration” in relation to data pipelines or ETLs is complicated, as integrations usually imply a closer, or more real-time access to data. ETLs are an appropriate tool for moving data between systems when changes either happen infrequently, or when access to the data does not rely on having the absolute latest copy of the data, such as when a nightly sync is sufficient.

ArcGIS Data Pipelines

ArcGIS Data Pipelines is an ArcGIS product that provides integration support for connecting your data with ArcGIS. With Data Pipelines, you can connect to and read data from where it is stored, perform data preparation operations, and write the data out to a feature layer that is readily available in ArcGIS. You can use the Data Pipelines interface to construct, run, and reproduce your data preparation workflows.

Data Pipelines works with vector data (for example, points, lines, and polygons) and tabular data (for example, data represented as a table). You can connect to a variety of data sources including Amazon S3, Google BigQuery, Snowflake, feature layers, and more. Once connected, you can use tools to blend, build, and integrate datasets for use in your workflows.

Data Pipelines tools are structured as tool sets with capabilities such as clean, construct, integrate, and format. For example, the following workflows are supported by Data Pipelines tools:

  • Manipulate dataset schemas by updating field names or types
  • Select a subset of fields to extract targeted information
  • Find and replace attribute values to clean or simplify the data
  • Combine datasets using join or merge functionality
  • Calculate fields using Arcade functions
  • Create geometry or time fields for use in spatial or temporal analysis

While building a data pipeline and configuring tools, you can preview results. You can inspect and perfect the data in preparation for writing the result. Once you’ve completed the data pipeline, you can run it to create or update an ArcGIS feature layer that will be available in your content. You can configure geometry and time properties for the output feature layer so it’s ready for use in additional workflows such as spatial or temporal analysis, dashboards, or web maps.

For more information on Data Pipelines, see Introduction to Data Pipelines.

ArcGIS Data Interoperability

The ArcGIS Data Interoperability extension is an integrated spatial extract, transform, and load (ETL) tool set for both ArcGIS Pro and ArcGIS Enterprise that runs in the geoprocessing framework using Safe Software’s FME technology.

Data Interoperability provides users with an authoring experience using workbenches, which can connect to a wide variety of data sources, perform simple or complex operations on data in transit, and then write data to a wide variety of destinations. This is a fully featured ETL system, where workbenches can be run manually for inspection or testing, automated to run on a regular basis, or invoked as part of a geoprocessing service.

Data Interoperability can be accessed from ArcGIS Pro, through tools and workbenches, or can be accessed in ArcGIS Server through the Data Interoperability Extension for Server, which allows authors to publish ETLs so they can be accessed as web services.

For more information on Data Interoperability, see the following resources:

Other ETL options

One additional common approach to building ETLs is to use Python, specifically a combination of arcpy and the arcgis module of the ArcGIS Python API. The Python developer community, along with many commercial providers, have created hundreds of libraries that assist with connecting to almost any imaginable data source, from database clients to storage types, file parsers, and web client libraries. This means that any data source you can connect to, can be pulled into Python, transformed or reshaped, and pushed to an output, which in an ArcGIS-based system is often an editable feature service, imagery file or 3D dataset.

Many other ETL offerings are available from cloud providers, from independent software vendors, and many open-source tools. Each of these offerings will support different inputs, processors, and outputs, so it is important to consider the data movement requirements for your system before choosing an offering. Both ArcGIS Data Pipeline and Data Interoperability include pre-built connections to write data into ArcGIS, which can be an important efficiency boost when creating a system that implements this pattern.

Top