Skip to main content

Join Types

The following types of joins are supported. For example, the following tables contains information about employees and departments.

Employee table:

Name

DepartmentID

Role

Dave Smith

001

Product Marketing Manager

Julie Jones

002

Software Engineer

Scott Tanner

001

Director of Demand Gen

Ted Connors

002

Software Engineer

Margaret Lane

001

VP of Marketing

Mary Martin

004

Receptionist

Department table:

Name

DepartmentID

Marketing

001

Engineering

002

Accounting

003

In the above example, DepartmentID is the key to use in both tables for any joins.

Inner Join

An inner join requires that key values exist in both tables for the records to appear in the results table. Records appear in the merge only if there are matches in both tables for the key values.

  • If you want to include rows containing non-matching values, you must use some form of an outer join. See below.

For the preceding example tables, an inner join on the DepartmentID table produces the following result table:

Employee.Name

Employee.DepartmentID

Employee.Role

Department.Name

Department.DepartmentID

Dave Smith

001

Product Marketing Manager

Marketing

001

Julie Jones

002

Software Engineer

Engineering

002

Scott Tanner

001

Director of Demand Gen

Marketing

001

Ted Connors

002

Software Engineer

Engineering

002

Margaret Lane

001

VP of Marketing

Marketing

001

Notes:

  • All fields are included in the merged result set. Fields from the first dataset are listed first.

  • The row for Mary Martin is excluded, since there is no reference in the Department table for her department identifier. The row for Accounting is excluded, since there is no reference in the Employee table for the department identifier.

    • To include these rows, you either need to augment the data or perform a form of an outer join.

  • A null value in one table does not match a null value in another table. So, rows with null values in a join key are never included in an inner join. These values should be fixed.

    Tip

    An inner join can be used to eliminate rows with null values in their key fields.

Left Join

A left join (or left outer join) does not require that there be matching records for each value in the key value of the source (left) table. Each row in the left table appears in the results, regardless of whether there are matches in the right table.

For the preceding example tables, a left join on the DepartmentID table produces the following result table:

Employee.Name

Employee.DepartmentID

Employee.Role

Department.Name

Department.DepartmentID

Dave Smith

001

Product Marketing Manager

Marketing

001

Julie Jones

002

Software Engineer

Engineering

002

Scott Tanner

001

Director of Demand Gen

Marketing

001

Ted Connors

002

Software Engineer

Engineering

002

Margaret Lane

001

VP of Marketing

Marketing

001

Mary Martin

004

Receptionist

NULL
NULL

Notes:

  • In this left join, the Mary Martin row has been added to the result, since her record in the Employee table does contain an entry for the DepartmentID. However, since there are no corresponding values in the Department table, the corresponding fields in the result table are NULL values.

Right Join

A right join (or right outer join) is the reverse of a left join. A right join does not require that there be matching records for each value in the key value of the secondary (right) table. Each row in the right table appears in the results, regardless of whether there are matches in the left table.

For the preceding example tables, a right join on the DepartmentID table produces the following result table:

Employee.Name

Employee.DepartmentID

Employee.Role

Department.Name

Department.DepartmentID

Dave Smith

001

Product Marketing Manager

Marketing

001

Julie Jones

002

Software Engineer

Engineering

002

Scott Tanner

001

Director of Demand Gen

Marketing

001

Ted Connors

002

Software Engineer

Engineering

002

Margaret Lane

001

VP of Marketing

Marketing

001

NULL
NULL
NULL

Accounting

003

Notes:

  • In this right join, the Accounting entry is added. However, since there is no entry in the Employee table for the DepartmentID value, those fields are NULL values in the result set.

Full Outer Join

A full outer join combines the effects of a left join and a right join. If there is a match between the key values, a row is written in the result.

  • If there is no match for a key value that appears in either table, a single record is written to the result, with NULL values inserted for the fields from the other table.

Employee.Name

Employee.DepartmentID

Employee.Role

Department.Name

Department.DepartmentID

Dave Smith

001

Product Marketing Manager

Marketing

001

Julie Jones

002

Software Engineer

Engineering

002

Scott Tanner

001

Director of Demand Gen

Marketing

001

Ted Connors

002

Software Engineer

Engineering

002

Margaret Lane

001

VP of Marketing

Marketing

001

Mary Martin

004

Receptionist

NULL
NULL
NULL
NULL
NULL

Accounting

003

Notes:

  • Any duplicated rows between joining from left-to-right and from right-to-left are removed from the results.

Cross Join

A cross join combines each row of the first data set with each row of the second dataset, where every combination is represented in the output. As a result, the number of total rows in the join are:

Rows(DatasetA) * Rows(DatasetB)

Note

Depending on the size of your datasets, a cross join can greatly expand the size of the output, which may increase costs in some environments.

Self Join

A self join is a join operation between a dataset and a copy of itself. For example, you can use a self-join to invert the structure of hierarchical data, such as brand-product or manager-employee.

Designer Cloud Powered by Trifacta Enterprise Edition supports joins between a recipe and any upstream recipe or dataset.

    • You cannot join a recipe to itself.

    • You can join it to its source imported dataset.When a self-join is performed with a recipe connected to its source dataset, only one line connects the imported dataset with the recipe in Flow View. This is as designed.

    • You can join a recipe to any recipe upstream of it. Examples:

      • You can create an empty recipe after the recipe from which you wish to self-join. In this new empty recipe, you add the join step back to the original recipe.

      • You can insert an empty recipe between an imported dataset and the recipe where the self-join is performed. When you perform the self join in the first recipe, you join to the empty recipe you just created in between.

      • In both examples, you can see multiple lines in Flow View to indicate the self-join.

Joins Together

The following diagram summarizes the relationships between the types of supported joins. In each venn diagram, the area of intersection is the set of records that contain shared key values.

JoinVennDiagram.png

Figure: Join Types