You can create rules to validate the quality of the data in your sample. When created, these rules allow you to highlight exceptions to the rule to assist in building your data cleansing recipe steps.
- A data quality rule evaluates the values in one or more columns against a test criteria that you define.
includes a library of pre-defined data quality rule types. For more information, see Data Quality Rules Reference.
D s product r true
- You can also create a custom rule using functions in the language.
- Data quality rules are one of several features available for monitoring data quality during import, transformation, and export of your datasets. For more information, see Overview of Data Quality.
NOTE: Data quality rules are not transformation steps. They assess the current state of the sampled data in the Transformer page.
NOTE: As you apply transformation steps to the data, the state of your data quality rules is automatically updated to reflect the changes. If you delete columns or other elements referenced in the data quality rules, errors are generated in the Transformer page.
You can add a rule from inside the Transformer page.
- You create rules inside the Transformer page. In the toolbar at the top of the screen, click the Data Quality Rules icon on the right side of the toolbar.
- The Data Quality rules panel opens in the context panel. For more information, see Data Quality Rules Panel.
- If you have not created any rules, the panel is empty. To create a new rule, click Add Rule.
- The available types of data quality rules are displayed. Select your rule type.
- A simple one is
Not Null. See Examples below.
- You can also add custom rules based on formulas that you specify. See "Add Custom Rule" below.
- A simple one is
Select the column or columns to which the rule applies.
Tip: Some rules can be applied to multiple columns.
- Specify the other parameters as needed.
Review the previewed results.
Tip: To simplify the preview, click the Show Only Affected Columns checkbox in the status bar.
- When finished, click Add to add the rule.
The new rule is displayed in the Data Quality Rules panel. In the data quality bar for the rule, green indicates the row values that have passed the rule, and red indicates the row values that failed.
- Hover over either color to see the row counts and percentage.
- Select either color to highlight the indicated rows in the data grid.
Tip: After creating a rule, you can jump back and forth between the Recipe panel and this panel to review how your changes to your recipe steps affect the data quality bars for your rules.
Additional options are available in the context menu for the rule. For more information, see Data Quality Rules Panel.
Example - storeAddress column is Not Missing
The following data quality rule tests the values in the
storeAddress column to see if they are missing (empty) values.
Example - primaryKey column is Unique
The following rule evaluates the
primaryKey column to determine if all values in it are unique.
Example - SKU column matches pattern of SKU + 6 digits
Suppose the values of your SKUs must be in the form of "
SKU + 6 digits".
|D s item|
|D s item|
Example - orderColor must be "Blue", "Yellow" or "Green"
This rule tests the values in the
orderColor column to verify that all values are
In the following, the Acceptable values must be formatted as an array. See below.
Add Custom Rule
You can add custom rules using formulas containing
|D s lang|
|D s lang|
- In the Data Quality Rules panel, click Add Rule.
- Under Other Rules, select
In the Formula textbox, enter the
formula to test your data.
D s lang Info
NOTE: The formula that you provide must evaluate to
truevalues are highlighted in green in the data quality bar for the rule.
For aggregation functions, you can group the evaluation of your rule based on the values in your grouping column.
Tip: You can group by multiple columns. The first column is the outermost grouping.
To add the rule, click Add.
Example - sum of daily sales >= 100
You can use data quality rules to perform some data analysis functions. For example, suppose you want to flag the dates where the total sales of all of your orders was less than 100.
When this rule is added, the rows whose date total is less than 100 are flagged in red.
To edit a rule, select Edit rule from the context menu for the rule in the panel.
To delete a rule, select Delete rule from the context menu for the rule in the panel.
When you generate a profile as part of your job results, you can download the profile in JSON or PDF format.
When you download the profile in JSON format, the set of rules for the job are also included. Search for
profilerRules in the JSON file.
For more information, see Job Details Page.
When flows are exported and imported, the rule definitions for the recipes in the flow are also exported. For more information, see Export Flow.