Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Excerpt

You can join together data based on the presence of one or more keys in your source dataset and the joined-in dataset or recipe. A join is a data operation in which two or more tables or datasets are merged into one based on the presence of matching values in one or more key columns that you specify. These shared columns are called the join keys of the two sets of rows that you are attempting to join.

Overview

Using a Join transformation, you You can join a recipe or dataset to any of the following objects:

  • Another recipe
  • An imported dataset
  • A reference dataset

For more information on the interface for building joins, see Join Window.

Create Join

You can join datasets through the following mechanisms:

  • Flow View: Select a dataset or recipe object. Right-click and select Append Join

    Tip

    Tip: The join you specify from Flow View is added as the last step to the recipe. If you selected a dataset to which to add the join, a recipe is created from the object, and the join is added as the first step of the new recipe.

    See Flow View Page.

  • Transform Builder: Search for and select Joinjoin datasets. See Transform Builder.

Joins are created through the Join window. This workflow is a workflow process described below. For more information on these steps, see Join Window.

Step - Choose dataset or recipe to join

...

The first dataset to which you are joining in another is typically called the left dataset.

CustIdLastNameFirstName
c001JonesJack
c002KimKen
c003LeeLarry
c004MillerMike
c005

Dataset B:

The second dataset that you are joining in to the first is typically called the right dataset.

CustIdRegionCompanyName
c002EastACME, Inc.
c003WestTrifax, Inc.
c005NorthExample Co.
c006SouthAce Industries

Join types

There are multiple types of joins, which generate very different results. When you perform a join, you specify the type of join that is applied. The joined-together rows that appear in the output dataset are determined by the type of join that you selected and matching of values in the join key columns.

The following are the basic join types. The Example column references Dataset A (left) and Dataset B (right) from above.

Join TypeDescriptionExample
inner joinIf a join key value appears in the left dataset and the right dataset, the joined rows are included in the output dataset.In the above output, rows c002 and c003 are included only.
left joinIn a left join, all of the rows that appear in the left dataset appear in the output, even if there is no matching join key value in the right dataset.

In the above output, rows c001, c002, c003, c004, and c005 are included.

Rows c006 is excluded.

right joinIn a right join, all of the rows that appear in the right dataset appear in the output, even if there is no matching join key value in the right dataset.

In the above output, rows c002, c003, c005, and c006 are included.

Rows c001 and c004 are excluded.

outer joinAn outer join combines the effects of a left and a right join. Each key value from both datasets is included in the output. If the key value is not present in one of the datasets, then null values are written into the columns from that dataset.

In the above output, rows c001, c002, c003, c004, c005, and c006 are included.

Rows c001, c004, c005, and c006 contain some null values.

cross join

A cross join matches every row in the source dataset with a row in the joined-in dataset, regardless of whether the join keys match.

Info

NOTE: A cross join can greatly expand the number of rows in your dataset, which may impact performance.


If Dataset A has 5 rows and Dataset B has 4 rows, the output has 20 rows.
self joinA self join matches the rows in the left dataset with a version of itself (dataset or recipe) on the right side. Some limitations apply.

For more information, see Join Types.

Step - Specify output columns for the join

...

  • Ignore case: Ignore differences in case between values in the join key columns. MyValue matches with MYVALUE
  • Ignore special characters: Ignore special characters that appear in the join key values. 
  • Ignore whitespace: Ignore spaces, tabs, and other whitespace values that may appear in join key values.

Create fuzzy join

fuzzy join applies a fuzzy matching algorithm to String values in the join key column to account for slight differences in how values are written.

Info

NOTE: Fuzzy joins can only be applied to String data types. Other data types cannot be fuzzy-matched using the algorithm.

This algorithm relies on the

...

doublemetaphone function, which attempts to normalize text values based on how the string is spoken by an English speaker. For more information, see https://en.wikipedia.org/wiki/Metaphone#Double_Metaphone.

  • Fuzzy match: Enable fuzzy matching based on English language pronunciation using the

...

  • doublemetaphone function.

Add multiple join keys

For more complex join operations, you can add additional join keys to evaluate. Multi-key joins can be helpful for:

...

To add a second join key, click Add when modifying the join keys and conditions. Specify the keys in each dataset as needed. D s alsoinCQLtruelabel(label = "join")