Page tree

Trifacta Documentation


Contents:

   

Contents:


You can find the invalid or mismatched data and fix it by applying multiple methods.

Locate Invalid Values

At the top of each column, you can review the visualizations of the data quality bar and the column histogram. 

Figure: Example column histogram

In the data quality bar, the following colors correspond to evaluations of the column's values with respect to the column's data type.

Color barDescription
greenValid values for the current data type of the column
redInvalid values for the current data type of the column
blackMissing values could be empty or null.

Fix Invalid Data

You can use the following steps to fix invalid data for a column's data type.

Steps:

  1. From the Transformer page, click the red bar of the data quality bar for the column. You can review the total count of mismatched data. 

  2. A set of predictive suggestions is displayed in the right panel. Review the suggested transformations. You can click them to preview the results.  

    Tip: To show only the rows affected by the previewed transformation, click the Show Only Affected Rows checkbox in the status bar at the bottom of the screen. 

    1. In the following example, the red bar in the POS_Sales column has been clicked to select the mismatched values in the column compared to its current data type. You can select a transformation (for example, set transformation) to address the mismatched values.

      Figure: Suggestions for fixing mismatched data

  3. To make modifications to the suggested transformation, click Edit. For example, you can modify the formula to set the value if mismatched to 0. Your transformation should look like the following:

    Transformation Name Edit with formula
    Parameter: Columns Multiple
    Parameter: Formula ifmismatched($col, ['Float'], 0)
    Parameter: Action Set


  4. To add the selected suggestion, click Add. The sampled data is transformed. 

Tip: You can also create an Edit with formula transformation directly from the Recipe panel.

Build Data Quality Rules to Identify Invalid Data for Your Requirements

Data quality rules enable you to apply requirements on your datasets to columns of data. These rules are not part of your recipe and persist in the Transformer page throughout the recipe development process. 

For example, suppose you have a column called POS_Sales which contains the values of individual transactions. In your organization, this value might:

  • Never be less than 0.
  • Very rarely exceed 1,000,000.00.

You can create and use the data quality rules to identify if the values in this column that do not comply with the above criteria. 

Steps:

  1. In the Transformer page, click the Data quality rules icon.
  2. Click Add rule.
  3. To address the above criteria, you can add a rule like the following:

    Data Quality Rule In Range
    Parameter: Input type Column
    Parameter: Column POS_Sales
    Parameter: Minimum value 0
    Parameter: Maximum value 1000000
  4. Review other settings as needed.
  5. Check the preview in the data grid. 
  6. If all looks good, click Add rule.

The rule is added to the Transformer page. 

NOTE: After you create a data quality rule, it remains present in the Transformer page. Subsequent updates to your recipe may result in values that violate your data quality rules. In this manner, data quality rules are helpful guides to ensure that your data remains consistent throughout the recipe development process.

This page has no comments.