Excerpt |
---|
Cluster clean enables users of to standardize values in a column by clustering similar values together. Using one of the supported matching algorithms, can cluster together similar column values. You can review the clusters of values to determine if they should be mapped to the same value. If so, you can apply the mapping of these values within the application. |
- For more information on how to apply cluster clean, see Standardize Page.
- For more information on other methods of standardization, see Overview of Standardization.
...
You can apply cluster-based standardization through the Standardize Page. See Standardize Page.
Clustering Algorithms
The following algorithms for clustering values are supported.
...
- If you have auto-standardized values, the most common value that is applied during job execution is the value that appeared most frequently in the sample that was displayed when the cluster clean step was defined. The most common value is not redetermined based on the entire dataset.
- Values that were not part of the displayed sample may not be factored in the standardization process during job execution.