Page tree

Trifacta Documentation


Contents:

   


Contents:


You can build data quality rules  to validate the quality of your data. These data quality rules highlight exceptions when there is a flaw or mismatch in your data. Data quality rules persist with your recipe; as you make updates to your recipe, those changes are automatically monitored in your data quality rules.

Add a Rule

Steps:

  1. In the Transformer page, click the Data quality rules icon.
  2. From the Data quality rules panel, click Add rule. The Add data quality rule panel is displayed with the available types of data quality rules.

    NOTE: If you have not created any rules, the panel is empty. To review a set of suggested data quality rules based on your dataset, click  View suggestions and add the required rules to your dataset.

  3. From the Add data quality rules panel, select or search for the required data quality rule. 

  4. When a data quality rule is selected, enter the required details. See the following example for detailed steps.

    NOTE: The options vary based on the selected data quality rule.

  5. To add a new rule, click AddThe new rule is displayed in the Data quality rules panel.

Example - Less than rule

The options vary based on different types of data quality rules.

For the Less Than rule, the following options are displayed:

  1. From the Input Type column drop-down, select the required column or metric to check for the data quality. For example, Average.
  2. From the Column drop-down, select the required column to which the input is applied.
  3. For aggregation functions, you can group the evaluation of your rule based on the values in your grouping column. This step is optional. If unspecified, the metric is calculated over a single group with all rows.
  4. In the Maximum value field, enter the required upper bound of the range. 
  5. To exclude the maximum value from the range, select the  Exclude maximum value from range  checkbox.

    NOTE: The  Exclude minimum and maximum from range  rule parameter is applicable only to In Range, Greater than, and Less Than rule types.

  6. To allow missing values to be acceptable for a specified column, select the May be missing checkbox.

    NOTE: The May be missing rule parameter is not applicable to Not Null, Not Missing, Not Equal, Not In Set, and Formula rule types.

  7. Review the previewed results.

    Tip: To simplify the preview, click the Show Only Affected Columns checkbox in the status bar.

  8. To add the defined rule, click AddThe new rule is displayed in the Data quality rules panel and is immediately applied to your data.

Data Quality Rule Less than
Parameter: Input type Maximum
Parameter: Column NET_SHIP_QTY
Parameter: Group rows by POS_QTY
Parameter: Maximum Value 24

Add a Custom Rule

You can add custom rules using formulas.

Steps:

  1. From the Data quality rules panel, click Add rule.
  2. Under Other Rules, select Formula.
  3. In the Formula text box, enter the required formula.

    NOTE: The formula that you provide must evaluate to true or falsetrue values are highlighted in green in the data quality bar for the rule.

  4. For aggregation functions, you can group the evaluation of your rule based on the values in your grouping column. This step is optional. If unspecified, the metric is calculated over a single group with all rows.

  5. To add the rule, click Add.

Example -  Formula

Suppose that you want to flag a column if the total sales is less than 75, you can create a custom rule using the Formula option.

Data Quality Rule Custom Formula
Parameter: Formula TotalSales >= 75
Parameter: Group rows by Sales

Add a Metric-Based Rule

You can create a data quality rule using custom metrics. You can use the calculated metric type (derived metrics) as a data quality input type and create a metric-based data quality rule. 

NOTE: Metric-based rules are supported only for some metric types. 


Steps:

For example, you can create a metric-based rule to find out if the minimum value metric is within the acceptable range. Select a In Range data quality rule, select Minimum input metric, and other required parameters to create a metric-based rule. 

  1. In the Transformer page, click the Data quality rules icon.
  2. From the Data quality rules panel, click Add rule. The Add data quality rule panel is displayed with the available types of data quality rules.

  3. From the Add data quality rules panel, select or search the required data quality rule that supports metric. For example, In Range. The selected data quality rule panel is displayed.
  4. From the Input Type column drop-down, select the required column or metric to check for the data quality. For example, Average.
  5. From the Column drop-down, select the required column for which you want to add a metric-based rule. 
  6. You can group the evaluation of your rule based on the values in your grouping column. This step is optional. If unspecified, the metric is calculated over a single group with all rows.
  7. In the Minimum value and Maximum value field, enter the required lower and upper bound values. 
  8. To exclude the maximum value from the range, select the  Exclude maximum value from range  checkbox.

  9. To allow missing values to be acceptable for a specified column, select the May be missing checkbox.

    NOTE: The May be missing rule parameter is not applicable to Not Null, Not Missing, Not Equal, Not In Set, and Formula rule types.

  10. Review the previewed results.

    Tip: To simplify the preview, click the Show Only Affected Columns checkbox in the status bar.

  11. To add a new rule, click AddThe new rule is displayed in the Data quality rules panel.

Example- In Range metric rule

You can create a rule if you want to have the column values to be in In Range. For example, the Sales column values should be within a minimum of 3 and a maximum of 7.

Data Quality Rule In Range
Parameter: Input type Average
Parameter: Column POS_SALES
Parameter: Minimum Value 3
Parameter: Maximum Value 7

Using Rules

In the data quality rule bar, a green bar indicates that the row values passed the rule check, and a red bar indicates that the row values are failed. You can hover over the displayed color to see the row counts and percentage. 

Context menu:

The following context menu options are available when you create a rule:

  • Edit rule: Edit the data quality rule.
  • Delete rule: Delete the data quality rule.

    NOTE: Deleting a data quality rule does not affect your data.

  • Show Failing Values Only: Highlight the values that have failed the rule. 
  • Show Passing Values Only: Highlights the values that have passed the rule
  • Clear Preview: Removes the value highlighting from the data grid.

This page has no comments.