You can join together data based on the presence of one or more keys in your source dataset and the joined-in dataset or recipe. A join is a data operation in which two or more tables or datasets are merged into one based on the presence of matching values in one or more key columns that you specify. These shared columns are called the join keys of the two sets of rows that you are attempting to join.
You can join a recipe or dataset to any of the following objects:
For more information on the interface for building joins, see Join Window.
You can join datasets through the following mechanisms:
Flow View: Select a dataset or recipe object. Right-click and select Append Join.
Tip: The join you specify from Flow View is added as the last step to the recipe. If you selected a dataset to which to add the join, a recipe is created from the object, and the join is added as the first step of the new recipe.
See Flow View Page.
join datasets. See Transform Builder.
Joins are created through a workflow process described below. For more information on these steps, see Join Window.
For discussion purposes, the following datasets are referenced in the sections below.
CustIdcolumn is shared between both datasets. This column is the join key, as there are no matches between the other colums.
CustIdin one dataset do not appear in the other.
The first dataset to which you are joining in another is typically called the left dataset.
The second dataset that you are joining in to the first is typically called the right dataset.
There are multiple types of joins, which generate very different results. When you perform a join, you specify the type of join that is applied. The joined-together rows that appear in the output dataset are determined by the type of join that you selected and matching of values in the join key columns.
The following are the basic join types. The Example column references Dataset A (left) and Dataset B (right) from above.
|inner join||If a join key value appears in the left dataset and the right dataset, the joined rows are included in the output dataset.||In the above output, rows |
|left join||In a left join, all of the rows that appear in the left dataset appear in the output, even if there is no matching join key value in the right dataset.|
In the above output, rows
|right join||In a right join, all of the rows that appear in the right dataset appear in the output, even if there is no matching join key value in the right dataset.|
In the above output, rows
|outer join||An outer join combines the effects of a left and a right join. Each key value from both datasets is included in the output. If the key value is not present in one of the datasets, then null values are written into the columns from that dataset.|
In the above output, rows
A cross join matches every row in the source dataset with a row in the joined-in dataset, regardless of whether the join keys match.
|If Dataset A has 5 rows and Dataset B has 4 rows, the output has 20 rows.|
|self join||A self join matches the rows in the left dataset with a version of itself (dataset or recipe) on the right side. Some limitations apply.|
For more information, see Join Types.
In the output dataset, the column names are taken directly from the column names in the source dataset. Potential issues:
You can apply a prefix to the column names that are sourced from the left dataset, the right dataset, or both.
In the recipe step that produces the join, the columns that you select are mentioned specifically by name. Optionally, you can choose to automatically add in all columns to your output. For example, if your source data for an imported dataset is augmented with 10 new columns, when you re-run your join, those new columns can be automatically added to the output dataset.
Tip: You should consider using these options if the schema of your data sources is likely to change in the future.
You should review the columns that are previewed as in the data grid.
NOTE: If you modify the selected dataset to join, the joined dataset, the join keys, or the fields to include in the output, subsequent steps in your transform recipe can be broken by the change. After you modify the join, you should select the last step in your recipe to validate all steps in the recipe.
You can apply the following modifications to how keys are matched. To modify a join key and condition, click the Pencil icon in the Join Keys & Conditions panel.
Optionally, you can configure the to ignore the following special characters, when matching values in join keys:
A fuzzy join applies a fuzzy matching algorithm to String values in the join key column to account for slight differences in how values are written.
NOTE: Fuzzy joins can only be applied to String data types. Other data types cannot be fuzzy-matched using the algorithm.
This algorithm relies on the metaphone function, which attempts to normalize text values based on how the string is spoken by an English speaker. For more information, see https://en.wikipedia.org/wiki/Metaphone.
NOTE: This feature may need to be enabled in your environment. See Workspace Settings Page.
Values in the join key columns are matched across a range of values, instead of matching single value to single value. When range joins are enabled, you can set the Condition value between the two join key columns when specifying the join keys. For more information, see Configure Range Join.
For more complex join operations, you can add additional join keys to evaluate. Multi-key joins can be helpful for:
To add a second join key, click Add when modifying the join keys and conditions. Specify the keys in each dataset as needed.