Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r097


D toc

D s ed

Excerpt

You can create data quality rules  to continuously validate that your data is accurate, valid, error-free, and ready for use. These rules persist with your recipe, so they provide continuous testing of your data. When these rules indicate failures, additional transformations are suggested to fix your data.

Data quality rules provide an automated way to identify data inaccuracy and highlight exceptions to monitor and track data cleanliness over time. Data quality rules enable you to continuously assess different qualitative dimensions such as accuracy, completeness, uniqueness, and validity.

Info

NOTE: Data quality rules are not transformation steps, but they assess the current state of the sampled data and can be used in constructing transformation steps to improve data quality.

In addition, you can use calculated metrics as a source of data quality inputs to create a metric-based data quality rule. For example, you can create a data quality rule where the average Price has to be greater than 50.00.

From the Transformer page, click the Data quality rules icon. If you have not created any rules, the panel is empty. To create a new rule, click Add rule.

Tip

Tip: To  review a set of suggested data quality rules based on your dataset, click  View suggestions

D s product
rtrue
 can automatically suggest a series of rules to validate various data quality aspects.  



D caption
Data quality rule suggestions
Tip

Tip: You can hover over the color bars to view the failed values and passed values. You can also select Show only affected checkbox to view only the passed or failed columns.

Rule Categories

Data quality rules evaluate the values in one or more columns against test criteria that you define. 

D s product
 has a set of pre-defined data quality rule types. You can select the required rule type to monitor data quality during the import, transformation, and export of your datasets.

Custom-based rules

You can create custom-based rules using formulas containing 

D s lang
 functions.

Metric-based rules

You can use custom metrics to assess data quality. You can use a calculated metric type (derived metrics) as a data quality input type and create a metric-based data quality rule. For example, you can create a constraint that the inventory quantity should be within a specific range. 

Metric input types are supported for the following rules:

  • In Range

  • Greater Than

  • Less Than

  • Equals

  • Not Equals

  • In Set

  • Not In Set

Create Rules

For more information on creating rules, see Build Data Quality Rules.

Data Quality in Job Details

After you have successfully run your job, you can review the results of your data quality rules applied across the entire dataset in the Rules tab on the Job Details page.

Info

NOTE: To display data quality results in your job details, visual profiling must be enabled for job execution. 


D caption
Data quality job details

D paralinks common
inSpacetrue
headingLevel2
MyPageOverview of Data Quality Rules

D paralinks common
inSpacetrue
MyPageData Quality Rules Reference