Contents:
Locate Invalid Values
At the top of each column, you can review the visualizations of the data quality bar and the column histogram.
Figure: Example column histogram
In the data quality bar, the following colors correspond to evaluations of the column's values with respect to the column's data type.
Color bar | Description |
---|---|
green | Valid values for the current data type of the column |
red | Invalid values for the current data type of the column |
black | Missing values could be empty or null. |
Fix Invalid Data
You can use the following steps to fix invalid data for a column's data type.
Steps:
From the Transformer page, click the red bar of the data quality bar for the column. You can review the total count of mismatched data.
A set of predictive suggestions is displayed in the right panel. Review the suggested transformations. You can click them to preview the results.
Tip: To show only the rows affected by the previewed transformation, click the Show Only Affected Rows checkbox in the status bar at the bottom of the screen.
In the following example, the red bar in the
POS_Sales
column has been clicked to select the mismatched values in the column compared to its current data type. You can select a transformation (for example,set
transformation) to address the mismatched values.Figure: Suggestions for fixing mismatched data
- To make modifications to the suggested transformation, click Edit. For example, you can modify the formula to set the value if mismatched to
0
. Your transformation should look like the following:
Transformation Name Edit with formula
Parameter: Columns Multiple
Parameter: Formula ifmismatched($col, ['Float'], 0)
Parameter: Action Set
To add the selected suggestion, click Add. The sampled data is transformed.
Tip: You can also create an Edit with formula
transformation directly from the Recipe panel.
Build Data Quality Rules to Identify Invalid Data for Your Requirements
Data quality rules enable you to apply requirements on your datasets to columns of data. These rules are not part of your recipe and persist in the Transformer page throughout the recipe development process.
For example, suppose you have a column called POS_Sales
which contains the values of individual transactions. In your organization, this value might:
- Never be less than 0.
- Very rarely exceed 1,000,000.00.
You can create and use the data quality rules to identify if the values in this column that do not comply with the above criteria.
Steps:
- In the Transformer page, click the Data quality rules icon.
- Click Add rule.
- To address the above criteria, you can add a rule like the following:
Data Quality Rule In Range
Parameter: Input type Column
Parameter: Column POS_Sales
Parameter: Minimum value 0
Parameter: Maximum value 1000000
- Review other settings as needed.
- Check the preview in the data grid.
- If all looks good, click Add rule.
The rule is added to the Transformer page.
NOTE: After you create a data quality rule, it remains present in the Transformer page. Subsequent updates to your recipe may result in values that violate your data quality rules. In this manner, data quality rules are helpful guides to ensure that your data remains consistent throughout the recipe development process.
Learn More
Find Bad Data: | Dataprep by Trifacta | Designer Cloud | Designer Cloud Enterprise Edition |
Column Histograms: | Dataprep by Trifacta | Designer Cloud | Designer Cloud Enterprise Edition |
Overview of Data Quality: | Dataprep by Trifacta | Designer Cloud | Designer Cloud Enterprise Edition |
Add Data Quality Rule: | Dataprep by Trifacta | Designer Cloud | Designer Cloud Enterprise Edition |
This page has no comments.