Page tree

Trifacta Documentation


Contents:

   

Contents:


You can manage missing values by applying methods such as copying values from another column, inserting constants, using functions, or deleting rows.

Missing values can be:

  • Empty values, such as an empty string:

    ''
  • Null values, in which no data is present at all.

In the Designer Cloud® application, these different types of values are represented as simply missing values. You can use the visual tools in the Transformer page to standardize missing values to either empty or null values, depending on your downstream requirements.

Locate Missing Values

At the top of each column, you can see the column histogram: 

Figure: Example column histogram

Color barDescription
greenValid values for the current data type of the column
redInvalid values for the current data type of the column
blackMissing values could be empty or null.

Manage Missing Values

Steps:

  1. In the Transformer page, click the black bar in the column histogram to select missing values in a column. 
  2. A set of predictive suggestions is displayed in the right panel. Review the suggested transformations. You can click them to preview the results. 
    1. For example, if the column is a primary key identifier for rows in the dataset, an empty or null value could be an indicator of bad data in the row. You may choose the suggestion to Delete rows.
  3. To make modifications to the suggested transformation before you add, click Edit.
  4. To add the selected suggestion, click Add. The sample of the data in transformed. 

    Tip: You can also use the Recipe panel and add the required functions to manage the missing data function.

Manage Missing and Null Values

When you edit the above transformation, it should look like the following:

Transformation Name Filter rows
Parameter: Condition Custom formula
Parameter: Type of formula Custom single
Parameter: Condition ismissing([Primary_Website_or_URL])
Parameter: Action Delete matching rows

In the above, the operative test is contained in the Formula textbox. The formula tests if the value is missing:

ismissing([Primary_Website_or_URL])

However, a missing value may be treated differently from a null value, so we can add to this formula a test to see if the value is null:

or(ismissing([Primary_Website_or_URL]),isnull([Primary_Website_or_URL]))

The above test is a bit more complicated. It tests if the value is missing or is null. If either of those conditions is met, then the row is deleted.

Tip: Functions can be nested.

The edited transformation should look like the following:

Transformation Name Filter rows
Parameter: Condition Custom formula
Parameter: Type of formula Custom single
Parameter: Condition or(ismissing([Primary_Website_or_URL]),ISNULL([Primary_Website_or_URL]))
Parameter: Action Delete matching rows

This page has no comments.