Page tree

Trifacta Documentation




You can create data quality rules  to continuously validate that your data is accurate, valid, error-free, and ready for use. These rules persist with your recipe, so they provide continuous testing of your data. When these rules indicate failures, additional transformations are suggested to fix your data.

Data quality rules provide an automated way to identify data inaccuracy and highlight exceptions to monitor and track data cleanliness over time. Data quality rules enable you to continuously assess different qualitative dimensions such as accuracy, completeness, uniqueness, and validity.

NOTE: Data quality rules are not transformation steps, but they assess the current state of the sampled data and can be used in constructing transformation steps to improve data quality.

In addition, you can use calculated metrics as a source of data quality inputs to create a metric-based data quality rule. For example, you can create a data quality rule where the average Price has to be greater than 50.00.

From the Transformer page, click the Data quality rules icon. If you have not created any rules, the panel is empty. To create a new rule, click Add rule.

Tip: To  review a set of suggested data quality rules based on your dataset, click  View suggestionsDesigner Cloud can automatically suggest a series of rules to validate various data quality aspects.  

Figure: Data quality rule suggestions

Tip: You can hover over the color bars to view the failed values and passed values. You can also select Show only affected checkbox to view only the passed or failed columns.

Rule Categories

Data quality rules evaluate the values in one or more columns against test criteria that you define.  Designer Cloud has a set of pre-defined data quality rule types. You can select the required rule type to monitor data quality during the import, transformation, and export of your datasets.

Custom-based rules

You can create custom-based rules using formulas containing  Wrangle functions.

Metric-based rules

You can use custom metrics to assess data quality. You can use a calculated metric type (derived metrics) as a data quality input type and create a metric-based data quality rule. For example, you can create a constraint that the inventory quantity should be within a specific range. 

Metric input types are supported for the following rules:

  • In Range

  • Greater Than

  • Less Than

  • Equals

  • Not Equals

  • In Set

  • Not In Set

Create Rules

For more information on creating rules, see Build Data Quality Rules.

Data Quality in Job Details

After you have successfully run your job, you can review the results of your data quality rules applied across the entire dataset in the Rules tab on the Job Details page.

NOTE: To display data quality results in your job details, visual profiling must be enabled for job execution. 

Figure: Data quality job details

This page has no comments.