Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

D toc


Cluster clean enables users of 

D s product
 to standardize values in a column by clustering similar values together. Using one of the supported matching algorithms, 
D s product
 can cluster together similar column values. You can review the clusters of values to determine if they should be mapped to the same value. If so, you can apply the mapping of these values within the application.


You can apply cluster-based standardization through the Standardize Page. See Standardize Page. 

Clustering Algorithms

The following algorithms for clustering values are supported.


  • If you have auto-standardized values, the most common value that is applied during job execution is the value that appeared most frequently in the sample that was displayed when the cluster clean step was defined. The most common value is not redetermined based on the entire dataset.
  • Values that were not part of the displayed sample may not be factored in the standardization process during job execution.

D s also