Page tree

 

Support | BlogContact Us | 844.332.2821

 

Contents:

This documentation applies to Trifacta Wrangler. Download this free product.
Registered users of this product or Trifacta Wrangler Enterprise should login to Product Docs through the application.

In the Union page, you can append data from one or more datasets to an existing dataset. For example, if you have multiple datasets containing transactional data, such as log files, you can use the union operation to join daily or weekly slices of this data into a single dataset.

In a union operation, the application attempts to match columns between multiple datasets. As needed, you can perform manual tweaks to the matching and decide which columns to include or exclude in the resulting dataset. 

  • A union operation is different from a join operation. In a join operation, data from two datasets is brought together based on a defined primary key. The type of join determines the columns included in the output. For more information, see Join Page.
  • In the Transform Builder, enter union in the Choose a transformation textbox. 

Tip: You should perform your union operations as early as possible in your recipe. Open the Recipe panel, and move the mouse beneath the line where you want to create the union. Click when a horizontal line appears to indicate that you can insert a step at the location. See Recipe Panel.

Figure: Union Page

The schema of the output that is to be generated by the union operation is displayed in the left panel. 

  • The column names of the first dataset are used to populate the column names of the generated dataset, where applicable.
  • Each dataset that has been added to the union is displayed in the right panel. 
PanelLeft SideRight Side 1Right Side 2
UpperOutput dataset - included cols.Dataset 1 - included cols.Dataset 2 - included cols.
LowerOutput dataset - excluded cols.Dataset 1 - excluded cols.Dataset 2 - excluded cols.

Custom column mappings:

As needed, you can modify the default column mappings in your dataset. To remap a column, hover over the column entry in the right panel, Then, click the Plus icon:

Figure: Custom Column Mapping

In the window, you can select the column in the source dataset to which to map to the field where you selected the icon. Additional information indicates if the column is part of the union or not.

  • Click the Search columns field and begin typing to locate fields in the selected dataset.
  • You can also specify that no match should be performed, which results in no data being imported from this column to the unioned dataset.

Dataset Actions: 

  • To add another dataset to the union, click Add datasets
  • Select one or more datasets to add to the union and choose one of the following methods to match columns:
    • Add Datasets and Align by Name. Matches are made based on the name of each column. Partial matches might be identified as matches, as well.
    • Add Datasets and Align by Position. Matches are made based on horizontal position of each column in each dataset. Extra columns will be dropped. This method might be useful if column names have changed between instances of each dataset.
  • To remove a dataset from the union operation, click the X next to the dataset's name in the right panel.
    • You cannot remove the original dataset from which the Union page was opened. 

In the left panel, you can review and modify the columns to be included in and excluded from the output. By default, all matching columns are included in the output; if there are no initial matching columns, all columns from the original dataset are included in the output by default. You can see the columns that are sources for the union output column on the same line in the right panel.

  • Each column entry contains a data type identifier for the column. You can change the data type after the union is completed.
  • To the right of the column name, you can see the number of datasets in the union where the column occurs.

Column Actions:

  • To review the top five values for any column, click the Expand icon. You can see the count of each value across all included datasets.
  • To remove a column from the union output, click the X icon to the left of the column entry in the upper panel.
  • To add a column to the union output, click the + icon next to the left of the column entry in the lower panel.
  • To include all available columns in the output, click include all.
  • To add the union as specified, click Add to Recipe

    NOTE: Unnest, union, or join transforms may significantly increase the number of rows or columns in your dataset. To prevent overloading the browser's memory, the application may apply a limit function to the results to artificially limit the number of rows displayed in your sample. You can generate a new sample if desired. This limitation is not applied during the job execution.


Dataset Updates:

To modify a union after it has been created, click the Edit icon for the entry in the Recipe panel. See Recipe Panel.

After you have added the union to your recipe, changes to the underlying datasets should automatically propagate to the dataset into which they have been unioned. No refreshing of the data is necessary. 

However, it is possible that subsequent changes to the dataset can cause problems in the downstream, consuming dataset. You can fix these dependency issues.

Tip: If you must freeze the data in the dataset that you are adding in, you should create a copy of the dataset as a snapshot and union in the copy. See Dataset Details Page.

To use the copy, edit the union transform in the target dataset and change the dataset that is in use. See Fix Dependency Issues.

Your Rating: Results: PatheticBadOKGoodOutstanding! 4 rates

This page has no comments.