Through the Standardize page, you can review similar column values and standardize them to values that you specify.
For example, master data on customers and products may use different names for the same product. For the Web team, the product may be called, "ACME Cookies Chocolate Chip," while the data from the Accounting team refers to this product as, "Cookies - Choc Chip." Through the Standardize page, you can normalize these values to a single consistent value for easier consumption downstream.
Standardizations applied through this page are stored in a connected database. These standardizations cannot be migrated between instances or workspaces.
To open the Standardize page:
standardize column. Then, select the column whose values you wish to standardize and click Next. See Search Panel.
In the above image, groups the various references to product names into its interpretation of meaningful clusters. This clustering is based on pattern-matching between values in the column.
NOTE: The Standardize page displays column values from the currently selected sample only. If the sample does not span the entire dataset, column values that are not captured in the display are not affected by standardization changes. You may need to take additional samples to capture column values outside the current sample.
In the left side of the screen, you can review the clusters of values that have been detected in the column. In the above image, you can see that the platform has identified a number of clusters based on simple differences in capitalization.
Undo: Undo the last action in the Standardize page
NOTE: This action does not undo recipe steps that have already been added.
Auto Standardize: The Wand tool automatically standardizes values based in a cluster to the most common value, as long as the most common value occurs in 25% of the clustered rows.
NOTE: For auto-standardization, the most common value is determined based on the cluster of values that are displayed in the current sample.
Tip: The Wand tool is recommended for beginning the standardization process. If values within a cluster have been modified, the cluster is not affected by the Wand tool. You can also apply the Wand tool on selected values in a cluster.
To standardize values:
If needed, change the clustering algorithm to apply to the values.
From the left panel, select the set of values that you wish to standardize.
Tip: Unclustered values are listed at the bottom of the panel. You should review these values when you are selecting clustered values.
To select multiple values, press
COMMAND/CTRL + click or select multiple values in the left column.
Tip: To select all values in a cluster, click the cluster header.
SHIFT+ click, which works across clusters of values.
After you have selected all values to standardize, you specify the new value to apply to these selected values in the right panel. This new value applies to all instances of the selected value or values. Changes are previewed next to the source values.
Tip: When you have values selected in a cluster, you can use one of the source values as your standardized value. Hover over the value in the left panel, and then click the icon that appears.
At any time, you can revert the changes to the cluster values. Click Revert to source.
Repeat the previous steps as needed.
Tip: You can perform multiple replacements in a single recipe step. So, you can configure all of your standardization steps before adding the single step to your recipe. For debugging purposes, you may want to separate some or all standardization into separate steps.
NOTE: You cannot copy and paste standardization steps in the Recipe panel.