In the Join window of the , you can join your current dataset with another dataset or recipe based upon information that is common to both datasets.
For example, you could join together two sets of regional sales data based upon the product identifiers that they both use to track sales. In the Search panel, enter
join datasets or select the Join icon from the toolbar.
Tip: Depending on the types of operations you need to perform, you may need to perform joins earlier or later in your recipe. For more information, see Optimize Job Processing.
In the Search panel, enter
join datasets. Then, select the dataset or recipe that you wish to join with your current dataset.
Select dataset or recipe to join
You can use the Data tab to preview the data in the selected object.
NOTE: You must have read access to the object to join it to your dataset.
In the next step, you specify the type of join and one or more join keys (columns).
Specify join type and join keys
Mouse over the Dataset samples indicator to see the current samples from the datasets that are part of the join. For more information, see Samples Panel.
From the drop-down, select the type of join to apply. For more information, see Join Types.
In the above image, the platform has determined that the item number (
Item_Nbr) field of Region 1 data and the item number (
ITEM_NBR) field from
REF_PROD should be used as the keys for performing the join.
NOTE: By default, displays a maximum of three rows of data for each join key value in your sample. So, when you specify your join, it may seem like there are joined values that are missing from the data grid panel. When the job is run across the entire dataset, however, the join generates the appropriate number of rows. For more information on changing the maximum number of rows that are previewed in the join, see Miscellaneous Configuration.
To add more join keys, click Add.
NOTE: Be careful applying multiple join keys. Depending on the join type, this type of join can greatly expand the size of the generated data.
By default, matches between join keys are performed on a strict, case-sensitive matching between key values in the selected columns. In some cases, it may be useful to loosen the conditions under which matches are found.Depending on the type of join, you can specify a range of matching values for your join conditions. For more information, see Configure Range Join.The following options are applied to the join key columns in both sources to attempt to find matches. After the join is executed, no data in either column is changed based on these selections.
Use a fuzzy matching algorithm for key value matching.
Fuzzy matching uses the doublemetaphone algorithm for matching strings (keys). Both primary encodings of each key value must match. See DOUBLEMETAPHONEEQUALS Function.
|Ignore case||Ignore case differences between the join key values for matching purposes.|
|Ignore special characters||Ignore all characters that are not alphanumeric, accented Latin characters, or whitespace, prior to testing for a match.|
|Ignore whitespace||Ignore all whitespace characters, including spaces, tabs, carriage returns, and newlines.|
You can use these metrics to identify the likelihood of accurate matching between the join keys and the row count generated in the output.
From the selected datasets, you can specify the columns to include in the output.
Select output columns
Review the list of available columns, which are displayed for both sources.
You can apply prefixes to column names in the joined dataset, which can be helpful for tracking the source of a column in complex datasets. For example, you may wish to prepend each column from a dataset called,
salesRegion01 with the prefix:
After you have joined in another set of data, subsequent changes to that data can be automatically reflected in the output of the join:
NOTE: After you add your join to the recipe, if the data grid is empty, then the keys that you specified in the join may not have a match in the currently selected sample. You should revisit the keys used in your join. If the join still generates an empty grid on the current sample, you should collect a new sample. See Samples Panel.
Tip: If you must freeze the data in the dataset that you are joining in, you should create a copy of the dataset as a snapshot and join in the copy. See Dataset Details Page.
To join in the copy, edit the join and change the source that is being joined. See Fix Dependency Issues.
Click Save and Continue.
After you have selected your columns and any advanced settings, click Review.
Review the join that you have specified. To modify any aspect of it, click the appropriate Edit link.
To add the specified join to your recipe, click Add to Recipe.