In the Union page, you can append data from one or more datasets to an existing dataset.

For example, if you have multiple datasets containing transactional data, such as log files, you can use the union operation to join daily or weekly slices of this data into a single dataset.

In a union operation, the  attempts to match columns between multiple datasets. As needed, you can perform manual tweaks to the matching and decide which columns to include or exclude in the resulting dataset.

Tip: Depending on the types of operations you need to perform, you should perform your union steps earlier or later in the recipe. See Optimize Job Processing.

In the Search panel, enter union in the textbox.


Union Page

Dataset Actions:

Mapping Schema

The schema of the output that is to be generated by the union operation is displayed in the left panel.

PanelLeft SideRight Side 1Right Side 2
UpperOutput dataset - included cols.Dataset 1 - included cols.Dataset 2 - included cols.
LowerOutput dataset - excluded cols.Dataset 1 - excluded cols.Dataset 2 - excluded cols.

Custom column mappings

As needed, you can modify the default column mappings in your dataset. To remap a column, hover over the column entry in the right panel, Then, click the Plus icon:

Custom Column Mapping

In the window, you can select the column in the current dataset that should appear in that location. Use this dialog to remap column order in each dataset.

Output Panel

In the left panel, you can review and modify the columns to be included in and excluded from the output. By default, all matching columns are included in the output; if there are no initial matching columns, all columns from the original dataset are included in the output by default. You can see the columns that are sources for the union output column on the same line in the right panel.

Column Actions:

Updates

To modify a union after it has been created, click the Edit icon for the entry in the Recipe panel. See Recipe Panel.

After you have added the union to your recipe, changes to the underlying data should automatically propagate to the dataset into which they have been unioned. No refreshing of the data is necessary.

However, it is possible that subsequent changes to your sources can cause problems in the output and downstream references. You can fix these dependency issues.

Tip: If you must freeze the data that you are adding in, you should create a copy of it as a snapshot and union in the copy. See Dataset Details Page.

To use the copy, edit the union transform in the copy and switch the data that is in use. See Fix Dependency Issues.