Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

Info

NOTE: Data quality rules are not transformation steps. They assess the current state of the sampled data in the Transformer page.


Info

NOTE: As As you apply transformation steps to the data, the state of your data quality rules is automatically updated to reflect the changes. If you delete columns or other elements referenced in the data quality rules, errors are generated in the Transformer page.

Add Rule

You can add a rule from inside the Transformer page.

...

  1. You create rules inside the Transformer page. In the toolbar at the top of the screen, click the Data Quality Rules icon on the right side of the toolbar. 
  2. The Data Quality rules panel opens in the context panel. For more information, see Data Quality Rules Panel.
  3. If you have not created any rules, the panel is empty. To create a new rule, click Add Rulerule.

    Tip

    Tip: You can review a set of suggested data quality rules that are based on your dataset and add them as needed. Click View suggestions. For more information, see Data Quality Rules Panel.

  4. The available types of data quality rules are displayed. Select your rule type. 
    1. A simple one is Not Null. See Examples below.
    2. You can also add custom rules based on formulas that you specify. See "Add Custom Rule" below.
  5. May be missing: Some rule types support the May be missing checkbox. When it is enabled, the Data Quality rule allows missing values to be acceptable for a specified column. 

    Info

    NOTE: The May be missing rule parameter is not applicable to Not Null, Not Missing, Not Equal, Not In Set, and Formula rule types.

  6. Select the column or columns to which the rule applies. 

    Tip

    Tip: Some rules can be applied to multiple columns.

  7. Click Add.

  8. Specify the other parameters as needed.
  9. Review the previewed results.

    Tip

    Tip: To simplify the preview, click the Show Only Affected Columns checkbox in the status bar.

  10. When finished, click Add to add the rule.

...

Additional options are available in the context menu for the rule. For more information, see Data Quality Rules Panel.

Examples

Example - storeAddress column is Not Missing

...

Suppose the values of your SKUs must be in the form of "SKU + 6 digits".

Following uses 

d-s-itemlang
itempatterns
 to perform the match. For more information on 
d-s-itemlang
itempatterns
, see Text Matching.

...

D trans
Typedq
p01NameColumn
p01ValueorderColor
p02NameAcceptable values
p02Value['Blue','Yellow','Green']
SearchTermIn Set

Add Metric-Based Rule

For some rule types, you can create a data quality rule using custom metrics to assess the data quality. You can use the calculated metric type (derived metrics) as a data quality input type and create a metric-based data quality rule. For more information on input types, see Data Quality Rules Reference.

For example, you can create a metric-based rule to find out if the minimum value metric is within the acceptable range. To do this, you can select a In Range data quality rule, select Minimum input metric, and other required parameters to create a metric-based rule. 

Info

NOTE: Metric-based rules are supported only for some metric types. For more information on the rule types that support metrics, see Data Quality Rules Reference.

Steps:

  1. You create rules inside the Transformer page. In the toolbar at the top of the page, click the Data Quality Rules icon on the right side of the toolbar. 
  2. The Data Quality rules panel opens in the context panel. For more information, see Data Quality Rules Panel.
  3. If you have not created any rules, the panel is empty. To create a new rule, click Add rule.

  4. The available types of data quality rules are displayed. 
  5. Select one the rule types types that support metrics. For example, select In Range
  6. Select the required input column value from the Input type drop-down. For example, select Average. For more information on available metrics, see Data Quality Rules Reference.
  7. Select the required column for which you want to add a metric-based rule. Additional examples are below.
  8. For Group by rows, you can select the column whose individual values are used to group the calculated metric rule. For example, if you group a maximum order by product identifier column, then the rule is calculated for individual product identifiers. 

    Info

    NOTE: If you do not group by row values, then the metric is calculated over all rows in the dataset. For large datasets, calculating flat aggregates can impact performance.

  9. Enter minimum and maximum values.
    1. Minimum value: Indicates the minimum (lower bound) of the range.
    2. Maximum value: Indicates the maximum (upper bound) of the range.

      Info

      NOTE: For Greater than and Less than rule types, Minimum value and Maximum value options are displayed.

  10. Exclude minimum and maximum from range: When this option is enabled, the Data Quality rule excludes the minimum and maximum values from the acceptable range.

    Info

    NOTE: The Exclude minimum and maximum from range rule parameter is applicable only to In Range, Greater than, and Less than rule types.

  11. May be missing: When this checkbox is selected, the Data Quality rule allows missing values to be acceptable for a specified column. 

  12. Click Add. The metric-based data quality rule is added.

The new rule is displayed in the Data Quality Rules panel. In the data quality bar for the rule, the green bar indicates the row values that have passed the rule, and the red bar indicates the row values that failed.

  • Hover over either color to see the row counts and percentage. 
  • Select either color to highlight the indicated rows in the data grid.

Additional options are available in the context menu for the rule. For more information, see Data Quality Rules Panel.

Examples

Example - Metrics for In Range rule type

This rule checks if the values in the POS_SALES column fall between the minimum and maximum range.

D trans
p03ValuePOS_QTY
Typedq
p05NameMaximum Value
p01NameInput type
p01ValueMinimum
p02NameColumn
p02ValuePOS_SALES
p05Value7
p03NameGroup rows by
p04Value3
p04NameMinimum Value
SearchTermIn Range

When this rule is added, the rows that do not fall within the range are flagged in red.

Example - Metrics for Greater than rule type

This rule checks if the values in the POS_QTY column are greater than the minimum value.

D trans
p03ValuePOS_SALES
Typedq
p01NameInput type
p01ValueAverage
p02NameColumn
p02ValuePOS_QTY
p03NameGroup rows by
p04Value10.4
p04NameMinimum Value
SearchTermGreater than

When this rule is added, the rows that do not fall within the range are flagged in red.

Example - Metrics for Less than rule type

This rule checks if the values in the NET_SHIP_QTY column are less than the maximum value.

D trans
p03ValuePOS_QTY
Typedq
p01NameInput type
p01ValueMaximum
p02NameColumn
p02ValueNET_SHIP_QTY
p03NameGroup rows by
p04Value24
p04NameMaximum Value
SearchTermLess than

When this rule is added, the rows that do not fall within the range are flagged in red.

Tip

Tip: You can use the available checkboxes to include the missing values and to exclude the minimum and maximum values from range.

Add Custom Rule

You can add custom rules using formulas containing 

D s lang
 functions. 

...

  1. In the Data Quality Rules panel, click Add Rule.
  2. Under Other Rules, select Formula.
  3. In the Formula textbox, enter the 

    D s lang
     formula to test your data. 

    Info

    NOTE: The formula that you provide must evaluate to true or false. true values are highlighted in green in the data quality bar for the rule.

  4. For aggregation functions, you can group the evaluation of your rule based on the values in your grouping column. 

    Tip

    Tip: You can group by multiple columns. The first column is the outermost grouping.

  5. To add the rule, click Add.

Examples

Example - sum of daily sales >= 100

...