In the Union page, you can append data from one or more datasets to an existing dataset. For example, if you have multiple datasets containing transactional data, such as log files, you can use the union operation to join daily or weekly slices of this data into a single dataset.
In a union operation, the attempts to match columns between multiple datasets. As needed, you can perform manual tweaks to the matching and decide which columns to include or exclude in the resulting dataset.
Tip: Depending on the types of operations you need to perform, you should perform your union steps earlier or later in the recipe. See Optimize Job Processing.
In the Search panel, enter
union in the textbox.
Similarity between sampled data in the datasets
NOTE: Auto align is not available after you have selected the dataset to union. Auto align may add a few seconds to the union operation.
The schema of the output that is to be generated by the union operation is displayed in the left panel.
|Panel||Left Side||Right Side 1||Right Side 2|
|Upper||Output dataset - included cols.||Dataset 1 - included cols.||Dataset 2 - included cols.|
|Lower||Output dataset - excluded cols.||Dataset 1 - excluded cols.||Dataset 2 - excluded cols.|
As needed, you can modify the default column mappings in your dataset. To remap a column, hover over the column entry in the right panel, Then, click the Plus icon:
Custom Column Mapping
In the window, you can select the column in the current dataset that should appear in that location. Use this dialog to remap column order in each dataset.
You can also specify that no match should be performed, which results in no data being imported from this column into the unioned dataset.
Tip: To map one of the dropped columns in your additional data to one of the source columns, hover over the empty No Match area next to the source column entry. Click the Plus icon to open the above mapping. Then, select the column from your additional data to slot into that location.
In the left panel, you can review and modify the columns to be included in and excluded from the output. By default, all matching columns are included in the output; if there are no initial matching columns, all columns from the original dataset are included in the output by default. You can see the columns that are sources for the union output column on the same line in the right panel.
To add the union as specified, click Add to Recipe.
To modify a union after it has been created, click the Edit icon for the entry in the Recipe panel. See Recipe Panel.
After you have added the union to your recipe, changes to the underlying data should automatically propagate to the dataset into which they have been unioned. No refreshing of the data is necessary.
However, it is possible that subsequent changes to your sources can cause problems in the output and downstream references. You can fix these dependency issues.
Tip: If you must freeze the data that you are adding in, you should create a copy of it as a snapshot and union in the copy. See Dataset Details Page.
To use the copy, edit the