In the Join page, you can join your current dataset with another dataset or recipe based upon information that is common to both datasets. For example, you could join together two sets of regional sales data based upon the product identifiers that they both use to track sales. In the Search panel, enter join datasets.

Tip: You should perform your join operation as late as possible in your recipe steps. If your joined dataset has not been completely transformed, subsequent steps might impact the data in the dataset to which it was joined. If needed, you can modify your join after its creation. Details are below.

Before You Begin

Step 1 - Select Objects to Join

For the current recipe loaded in the Transformer page, you can select any imported dataset, reference dataset, or recipe to join with the currently loaded data.

 

Join Object TypeDescription
imported dataset

Join is performed on the stored version of the data for any imported dataset to which you have access. If this data is overwritten, the data in the join is changed.

NOTE: When you join to an imported dataset that is not in your flow and generating the initial sample takes longer than expected, you may encounter an error indicating that the dataset is broken. Instead, create a recipe off of the imported dataset and join to that. This is a known issue.

reference dataset

Join is performed on a reference to a recipe in another flow. If the recipe changes the schema of the data, then the join is affected by those changes.

NOTE: You must explicitly create references in the source flow from the source recipe. The reference dataset is then available for inclusion in other flows.

recipe

You can join to any recipe in your current flow.

Tip: Where possible, join to a recipe instead of an imported dataset. With a recipe, you can switch to a random sample and then to use that in the join.

In the following image, the columns of the current dataset (Point of sale data from region 1) are displayed in the left column. On the right side, you can select the imported dataset, reference dataset, or recipe that you'd like to join with the current dataset. In this case, the REF_PROD.txt file has been selected, meaning that the join is performed on this imported dataset: 

Join Page

Step 2 - Specify Join Parameters

In the preview of the new dataset, you can see how the two datasets have been joined. By default,  attempts to specify the keys to use for the join and the type of join to perform.

Specify Join Keys

To review and modify the columns used as the keys for the join, click the Join Keys tab.

The following image displays the preview of the currently configured join. It has determined that the item number (Item_Nbr) field of Region 1 data and the item number (ITEM_NBR)  field from REF_PROD should be used as the keys for performing the join.  

NOTE: By default, displays a maximum of three rows of data for each join key value in your sample. So, when you specify your join, it may seem like there are joined values that are missing from the data grid panel. When the job is run across the entire dataset, however, the join generates the appropriate number of rows. For more information on changing the maximum number of rows, see Miscellaneous Configuration.

Review join keys and join type

Specify Other Join Parameters

  1. From the drop-down, you can select the type of join to apply. For more information, see Join Types below.
  2. Join keys: 
    1. To make changes to the two join keys, click the Pencil icon.
      1. In the left part of the panel, select the field from source 1 to use as its key. In the right, select the field from source 2.
      2. For more information on the other options, see Fuzzy Joins below.
      3. To save changes, click OK.

    2. To remove the two columns as join keys, click the Trash icon.

    3. To add more join keys, click Add

      NOTE: Be careful applying multiple join keys. Depending on the join type, this type of join can greatly expand the size of the generated data.

  3. Review the list of available columns, which are displayed for both sources. 
    1. After the currently selected keys, fields from the source 1 are listed, followed by the fields of source 2.
    2. To modify the settings for these columns, click Edit Settings. For more information, see Join Column Settings below.
    3. Use the checkboxes to select the columns to include in the joined dataset.
  4. To add the specified join to your dataset, click Add to Recipe. The joined dataset is displayed in the Transformer page.

Cross Joins

cross join combines each row of the first data set with each row of the second dataset, where every combination is represented in the output. As a result, the number of total rows in the join are:

Rows(DatasetA) * Rows(DatasetB)

NOTE: Depending on the size of your datasets, a cross join can greatly expand the size of the output, which may increase costs in some environments.

To specify a cross join, delete all join keys.

Fuzzy Joins

By default, matches between join keys are performed on a strict, case-sensitive matching between key values in the selected columns. In some cases, it may be useful to loosen the conditions under which matches are found.

You can select from the following options to perform more fuzzy joins. These options are applied to the join key columns in both sources to attempt to find matches. After the join is executed, no data in either column is changed based on these selections.

OptionDescription
Ignore CaseIgnore case differences between the join key values for matching purposes.
Ignore Special CharactersIgnore all characters that are not alphanumeric, accented Latin characters, or whitespace, prior to testing for a match.
Ignore WhitespaceIgnore all whitespace characters, including spaces, tabs, carriage returns, and newlines.

Auto-updating Joins

After you have joined in another set of data, subsequent changes to that data are automatically reflected in the output of the join. 

NOTE: After you add your join to the recipe, if the data grid is empty, then the keys that you specified in the join may not have a match in the currently selected sample. You should revisit the keys used in your join. If the join still generates an empty grid on the new sample, you should collect a new sample. See Samples Panel.

Tip: If you must freeze the data in the dataset that you are joining in, you should create a copy of the dataset as a snapshot and join in the copy. See Dataset Details Page.

To join in the copy, edit the join and change the source that is being joined. See Fix Dependency Issues.

Join Column Settings

In the Join Column Settings dialog, you can apply prefixes to column names in the joined dataset, which can be helpful for tracking the source of a column in complex datasets. For example, you may wish to prepend each column from a dataset called, salesRegion01 with the prefix: sR01.

Join Types

The following types of joins are supported. For example, the following tables contains information about employees and departments.

Employee table:

NameDepartmentIDRole
Dave Smith001Product Marketing Manager
Julie Jones002Software Engineer
Scott Tanner001Director of Demand Gen
Ted Connors002Software Engineer
Margaret Lane001VP of Marketing
Mary Martin004Receptionist

Department table:

NameDepartmentID
Marketing001
Engineering002
Accounting003

In the above example, DepartmentID is the key to use in both tables for any joins.

Inner Join

An inner join requires that key values exist in both tables for the records to appear in the results table. Records appear in the merge only if there are matches in both tables for the key values.

For the preceding example tables, an inner join on the DepartmentID table produces the following result table:

Employee.NameEmployee.DepartmentIDEmployee.RoleDepartment.NameDepartment.DepartmentID
Dave Smith001Product Marketing ManagerMarketing001
Julie Jones002Software EngineerEngineering002
Scott Tanner001Director of Demand GenMarketing001
Ted Connors002Software EngineerEngineering002
Margaret Lane001VP of MarketingMarketing001

Notes:

Left Outer Join

A left outer join (or left join) does not require that there be matching records for each value in the key value of the source (left) table. Each row in the left table appears in the results, regardless of whether there are matches in the right table.

For the preceding example tables, a left join on the DepartmentID table produces the following result table:

Employee.NameEmployee.DepartmentIDEmployee.RoleDepartment.NameDepartment.DepartmentID
Dave Smith001Product Marketing ManagerMarketing001
Julie Jones002Software EngineerEngineering002
Scott Tanner001Director of Demand GenMarketing001
Ted Connors002Software EngineerEngineering002
Margaret Lane001VP of MarketingMarketing001
Mary Martin004ReceptionistNULLNULL

Notes:

Right Outer Join

A right outer join (or right join) is the reverse of a left join. A right join does not require that there be matching records for each value in the key value of the secondary (right) table. Each row in the right table appears in the results, regardless of whether there are matches in the left table.

For the preceding example tables, a right join on the DepartmentID table produces the following result table:

Employee.NameEmployee.DepartmentIDEmployee.RoleDepartment.NameDepartment.DepartmentID
Dave Smith001Product Marketing ManagerMarketing001
Julie Jones002Software EngineerEngineering002
Scott Tanner001Director of Demand GenMarketing001
Ted Connors002Software EngineerEngineering002
Margaret Lane001VP of MarketingMarketing001
NULLNULLNULLAccounting003

Notes:

Full Outer Join

full outer join combines the effects of a left join and a right join. If there is a match between the key values, a row is written in the result.

Employee.NameEmployee.DepartmentIDEmployee.RoleDepartment.NameDepartment.DepartmentID
Dave Smith001Product Marketing ManagerMarketing001
Julie Jones002Software EngineerEngineering002
Scott Tanner001Director of Demand GenMarketing001
Ted Connors002Software EngineerEngineering002
Margaret Lane001VP of MarketingMarketing001
Mary Martin004ReceptionistNULLNULL
NULLNULLNULLAccounting003

Notes:

Joins Together

The following diagram summarizes the relationships between the types of supported joins. In each venn diagram, the area of intersection is the set of records that contain shared key values.

Join Types

After you have created a join, you can modify it through the Recipe panel. For more information, see Edit a Join.