In the Union page, you can append data from one or more datasets to an existing dataset. For example, if you have multiple datasets containing transactional data, such as log files, you can use the union operation to join daily or weekly slices of this data into a single dataset.
In a union operation, the application attempts to match columns between multiple datasets. As needed, you can perform manual tweaks to the matching and decide which columns to include or exclude in the resulting dataset.
- A union operation is different from a join operation. In a join operation, data from two datasets is brought together based on a defined primary key. The type of join determines the columns included in the output. For more information, see Join Page.
- In the Transform Builder, enter
unionin the Choose a transformation textbox.
Tip: You should perform your union operations as early as possible in your recipe. Open the Recipe panel, and move the mouse beneath the line where you want to create the union. Click when a horizontal line appears to indicate that you can insert a step at the location. See Recipe Panel.
The schema of the output that is to be generated by the union operation is displayed in the left panel.
- The column names of the first dataset are used to populate the column names of the generated dataset, where applicable.
- Each dataset that has been added to the union is displayed in the right panel.
|Panel||Left Side||Right Side 1||Right Side 2|
|Upper||Output dataset - included cols.||Dataset 1 - included cols.||Dataset 2 - included cols.|
|Lower||Output dataset - excluded cols.||Dataset 1 - excluded cols.||Dataset 2 - excluded cols.|
Custom column mappings:
As needed, you can modify the default column mappings in your dataset. To remap a column, hover over the column entry in the right panel, Then, click the Plus icon:
Custom Column Mapping
In the window, you can select the column in the source dataset to which to map to the field where you selected the icon. Additional information indicates if the column is part of the union or not.
- Click the Search columns field and begin typing to locate fields in the selected dataset.
- You can also specify that no match should be performed, which results in no data being imported from this column to the unioned dataset.
- To add another dataset to the union, click Add datasets.
- Select one or more datasets to add to the union and choose one of the following methods to match columns:
- Add Datasets and Align by Name. Matches are made based on the name of each column. Partial matches might be identified as matches, as well.
- Add Datasets and Align by Position. Matches are made based on horizontal position of each column in each dataset. Extra columns will be dropped. This method might be useful if column names have changed between instances of each dataset.
- To remove a dataset from the union operation, click the X next to the dataset's name in the right panel.
- You cannot remove the original dataset from which the Union page was opened.
In the left panel, you can review and modify the columns to be included in and excluded from the output. By default, all matching columns are included in the output; if there are no initial matching columns, all columns from the original dataset are included in the output by default. You can see the columns that are sources for the union output column on the same line in the right panel.
- Each column entry contains a data type identifier for the column. You can change the data type after the union is completed.
- To the right of the column name, you can see the number of datasets in the union where the column occurs.
- To review the top five values for any column, click the Expand icon. You can see the count of each value across all included datasets.
- To remove a column from the union output, click the X icon to the left of the column entry in the upper panel.
- To add a column to the union output, click the + icon next to the left of the column entry in the lower panel.
- To include all available columns in the output, click include all.
To add the union as specified, click Add to Recipe.
To modify a union after it has been created, click the Edit icon for the entry in the Recipe panel. See Recipe Panel.
After you have added the union to your recipe, changes to the underlying datasets should automatically propagate to the dataset into which they have been unioned. No refreshing of the data is necessary.
However, it is possible that subsequent changes to the dataset can cause problems in the downstream, consuming dataset. You can fix these dependency issues.
Tip: If you must freeze the data in the dataset that you are adding in, you should create a copy of the dataset as a snapshot and union in the copy. See Dataset Details Page.
To use the copy, edit the