Page tree



Contents:

The cloud-based version of Trifacta Wrangler is now available! Read all about it, and register for your free account.

Contents:


The following types of joins are supported. For example, the following tables contains information about employees and departments.

Employee table:

NameDepartmentIDRole
Dave Smith001Product Marketing Manager
Julie Jones002Software Engineer
Scott Tanner001Director of Demand Gen
Ted Connors002Software Engineer
Margaret Lane001VP of Marketing
Mary Martin004Receptionist

Department table:

NameDepartmentID
Marketing001
Engineering002
Accounting003

In the above example, DepartmentID is the key to use in both tables for any joins.

Inner Join

An inner join requires that key values exist in both tables for the records to appear in the results table. Records appear in the merge only if there are matches in both tables for the key values.

  • If you want to include rows containing non-matching values, you must use some form of an outer join. See below.

For the preceding example tables, an inner join on the DepartmentID table produces the following result table:

Employee.NameEmployee.DepartmentIDEmployee.RoleDepartment.NameDepartment.DepartmentID
Dave Smith001Product Marketing ManagerMarketing001
Julie Jones002Software EngineerEngineering002
Scott Tanner001Director of Demand GenMarketing001
Ted Connors002Software EngineerEngineering002
Margaret Lane001VP of MarketingMarketing001

Notes:

  • All fields are included in the merged result set. Fields from the first dataset are listed first.
  • The row for Mary Martin is excluded, since there is no reference in the Department table for her department identifier. The row for Accounting is excluded, since there is no reference in the Employee table for the department identifier.
    • To include these rows, you either need to augment the data or perform a form of an outer join.
  • A null value in one table does not match a null value in another table. So, rows with null values in a join key are never included in an inner join. These values should be fixed.

    Tip: An inner join can be used to eliminate rows with null values in their key fields.

Left Join

A left join (or left outer join) does not require that there be matching records for each value in the key value of the source (left) table. Each row in the left table appears in the results, regardless of whether there are matches in the right table.

For the preceding example tables, a left join on the DepartmentID table produces the following result table:

Employee.NameEmployee.DepartmentIDEmployee.RoleDepartment.NameDepartment.DepartmentID
Dave Smith001Product Marketing ManagerMarketing001
Julie Jones002Software EngineerEngineering002
Scott Tanner001Director of Demand GenMarketing001
Ted Connors002Software EngineerEngineering002
Margaret Lane001VP of MarketingMarketing001
Mary Martin004ReceptionistNULLNULL

Notes:

  • In this left join, the Mary Martin row has been added to the result, since her record in the Employee table does contain an entry for the DepartmentID. However, since there are no corresponding values in the Department table, the corresponding fields in the result table are NULL values.

Right Join

A right join (or right outer join) is the reverse of a left join. A right join does not require that there be matching records for each value in the key value of the secondary (right) table. Each row in the right table appears in the results, regardless of whether there are matches in the left table.

For the preceding example tables, a right join on the DepartmentID table produces the following result table:

Employee.NameEmployee.DepartmentIDEmployee.RoleDepartment.NameDepartment.DepartmentID
Dave Smith001Product Marketing ManagerMarketing001
Julie Jones002Software EngineerEngineering002
Scott Tanner001Director of Demand GenMarketing001
Ted Connors002Software EngineerEngineering002
Margaret Lane001VP of MarketingMarketing001
NULLNULLNULLAccounting003

Notes:

  • In this right join, the Accounting entry is added. However, since there is no entry in the Employee table for the DepartmentID value, those fields are NULL values in the result set.

Full Outer Join

full outer join combines the effects of a left join and a right join. If there is a match between the key values, a row is written in the result.

  • If there is no match for a key value that appears in either table, a single record is written to the result, with NULL values inserted for the fields from the other table. 
Employee.NameEmployee.DepartmentIDEmployee.RoleDepartment.NameDepartment.DepartmentID
Dave Smith001Product Marketing ManagerMarketing001
Julie Jones002Software EngineerEngineering002
Scott Tanner001Director of Demand GenMarketing001
Ted Connors002Software EngineerEngineering002
Margaret Lane001VP of MarketingMarketing001
Mary Martin004ReceptionistNULLNULL
NULLNULLNULLAccounting003

Notes:

  • Any duplicated rows between joining from left-to-right and from right-to-left are removed from the results.

Cross Join

cross join combines each row of the first data set with each row of the second dataset, where every combination is represented in the output. As a result, the number of total rows in the join are:

Rows(DatasetA) * Rows(DatasetB)

NOTE: Depending on the size of your datasets, a cross join can greatly expand the size of the output, which may increase costs in some environments.

Self Join

A self join is a join operation between a dataset and a copy of itself. For example, you can use a self-join to invert the structure of hierarchical data, such as brand-product or manager-employee.

Trifacta Wrangler supports joins between a recipe and any upstream recipe or dataset. 

    • You cannot join a recipe to itself.
    • You can join it to its source imported dataset. When a self-join is performed with a recipe connected to its source dataset, only one line connects the imported dataset with the recipe in Flow View. This is as designed.
    • You can join a recipe to any recipe upstream of it. Examples:
      • You can create an empty recipe after the recipe from which you wish to self-join. In this new empty recipe, you add the join step back to the original recipe.
      • You can insert an empty recipe between an imported dataset and the recipe where the self-join is performed. When you perform the self join in the first recipe, you join to the empty recipe you just created in between. 
      • In both examples, you can see multiple lines in Flow View to indicate the self-join. See Flow View Page.

Joins Together

The following diagram summarizes the relationships between the types of supported joins. In each venn diagram, the area of intersection is the set of records that contain shared key values.

Figure: Join Types

Your Rating: Results: 1 Star2 Star3 Star4 Star5 Star 12 rates

This page has no comments.