Page tree

Trifacta Documentation


Contents:

   


Contents:


In Designer Cloud, you can easily locate bad or missing data and fix it with a few clicks.

In the Transformer page, above each column of data is a data quality bar and histogram. 

Figure: Example data quality bar and column histogram

The top bar is the data quality bar. The data quality bar segments the values found in the column into three color-coded bands:

Color barDescription
greenValid values for the current data type of the column
redInvalid values for the current data type of the column
blackMissing values could be empty or null.

In the image above, you can identify the data type of the column based on the icon to the left of the column name (POS_Sales). In this case, the data type is Decimal. 

Tip: You can change the data type of the column by click the data type icon for the column. Details are below.

Below the data quality bar is the histogram, which identifies the frequency of specific values in the column. 

Find Missing Data

In the data quality bar, rows missing values in the column are identified by a black bar. 

When you click the black bar in the data quality bar, you select all of the missing values in the column. You can then define a transformation to fix them. For more information, see Manage Missing Data

Find Invalid Data

In the data quality bar, rows that contain invalid values for the current column data type are identified by a red bar. 

When you click the red bar in the data quality bar, you select all of the invalid values in the column. You can then define a transformation to fix them. For more information, see Fix Invalid Data

Change column data type

In some cases, invalid data can be fixed by simply changing the column data type. You can click the current data type indicator to review and select a more appropriate data type.

Tip: No value is invalid for the String data type.


Figure: Change column data type

For more information on changing data types, see Transform Columns.

Find Outlier Data

You can explore the details of a column of data to review statistical metrics on the data and to locate outlier values. In the column menu, select Column Details.

Figure: Column Details

Tip: When these bars are clicked or SHIFT-clicked, the selected values are used to prompt suggestions for how to transform them.

Tip: You can explore the patterns in the data in the Patterns tab, where you can also use these patterns to standardize the formatting of your data.

For more information, see Transform Columns.

Monitor Data Quality Issues

You can create data quality rules to monitor the specific requirements of your data. A data quality rule is a test of the data in a column that you define. Data quality rules can be one of many types, including custom rules.

Tip: Data quality rules are not recipe steps. They exist outside of your recipe and are tested continuously, so data quality rules become an effective means of ensuring that your data remains within acceptable boundaries throughout the recipe development process.

A custom data quality rule allows you to apply a formula to a set of data. For example, you can create a data quality rule to identify if any values in the Discount column are greater than 0.25 (25%):

Discount > 0.25

Violations of data quality rules are highlighted in a data quality bar and can be selected for further transformation.

This page has no comments.