|Method||Description||Recommended Uses||How to Use|
can identify similar values using one of the available algorithms for comparing values. You can compare values based on spelling or language-independent pronunciation.
- Standardize values to correct spelling differences, capitalization, whitespace, and other errors.
- Values must be consistent across rows of the column.
- Primarily used for string-based data types.
Available through the Standardize Page
can identify common patterns in a set of values and suggest transformations to standardize the values to a common format.
- Standardize values to follow a consistent format, such as phone numbers or social security numbers.
- Data type follows a somewhat consistent format and needs reshaping.
|Available in the Patterns tab in Column Details Panel|
|By function||You can apply one or more specific functions to cleanse your data of minor errors in formatting or structure.|
- Good method for improving the performance of pattern- or algorithm-based matching.
- Some functions are specific to a data type, while others have more general application.
|Edit column with formula in the Transform Builder.|
|Mix-and-match||You can use combinations of the above methods for more complex use cases.|
- Combine function-based standardization for global changes to all values with cluster- or pattern-based standardization for individual value changes.
Using one of the supported matching algorithms,
can cluster together similar column values. You can review the clusters of values to determine if they should be mapped to the same value. If so, you can apply the mapping of these values within the application.d-s-advfeature. For more information, see Overview of Cluster Clean.
Standardize Formatting by Patterns