|Method||Description||Recommended Uses||How to Use|
can identify similar values using one of the available algorithms for comparing values. You can compare values based on spelling or language-independent pronunciation.
- Standardize values to correct spelling differences, capitalization, whitespace, and other errors.
- Values must be consistent across rows of the column.
- Primarily used for string-based data types.
Available through the Standardize Page
can identify common patterns in a set of values and suggest transformations to standardize the values to a common format.
- Standardize values to follow a consistent format, such as phone numbers or social security numbers.
- Data type follows a somewhat consistent format and needs reshaping.
|Available in the Patterns tab in Column Details Panel|
|By function||You can apply one or more specific functions to cleanse your data of minor errors in formatting or structure.|
- Good method for improving the performance of pattern- or algorithm-based matching.
- Some functions are specific to a data type, while others have more general application.
|Edit column with formula in the Transform Builder.|
|Mix-and-match||You can use combinations of the above methods for more complex use cases.|
- Combine function-based standardization for global changes to all values with cluster- or pattern-based standardization for individual value changes.
These standardization techniques assume that your column contains only valid or empty values.
- The Columns value is a wildcard, which in this case applies the transformation across all columns in the dataset (
In the Formula, you see a nested expression. If the value in the column is valid against String data type, then, do the following to the column value:
NOTE: The IFVALID function tests each row value for validation against the specified data type. It does not test the column against the data type. See IFVALID Function.
- The TRIM function removes leading and trailing whitespace, which may register as a difference between values. See TRIM Function.
- The UPPER function then converts the output of the TRIM function to all uppercase. So, differences in capitalization are eliminated. See UPPER Function.
- The LEFT function truncates the output of the UPPER function to a maximum of 32 characters. See LEFT Function.