In the Column Details panel, you can review additional details about a column of your dataset. Select Column Details from any column menu or the Action menu in the column browser.
Tip: Use the Column Details panel to explore values in an individual column, when the context of the value is not important for your current exploration. For example, you can identify outlier values for the column or compare the number of unique values to number of rows to determine whether the column could be a key value.
Column Details panel - Overview tab
You can use this view to review basic counts and percentages of the values in the currently selected column. In addition to basic computations on valid, mismatched, and missing values, you can see breakdowns for the most frequent values and outlier values.
Depending on the data type of the column, additional statistics provide information on data quality and variation. For more information, see Column Statistics Reference.
In the Patterns tab, you can review patterns identified by the platform in the selected column's data and then create steps based on patterns that you select. Pattern profiling automatically finds and groups clusters of the column's values based on similarities in format and structure, such as differently formatted phone numbers, addresses, log entries, and name fields. For example, if some of your dataset's address values include apartment numbers, you can create a
split transform based on a pattern that includes the apartment numbers.
NOTE: In this tab, the count of values and the
NOTE: Wide columns, such as Arrays, Objects, or freeform text, might take a while to profile.
Tip: You can see data in the data grid while exploring patterns through the context panel. See Pattern Details Panel.
Column Details panel - Patterns tab
All non-blank values are captured in the
all patterns category, which you can expand to display the patterns that capture subsets of all values. Patterns are displayed in a tree structure, with each lower level describing a subset of the parent pattern.
Tip: Hover over a pattern or sub-pattern to see the affected values in the example data beneath it.
Tip: When you select a pattern group, you may be presented with suggestions for standardizing the values in the column to a single format. In some cases, you might want to remove unnecessary data first. For example, standardization of phone numbers is easier if any
Tip: Pattern suggestions are created based on the first few thousand rows of data in your sample. For best results, you should generate a random sample with a representative set of patterns in the first rows in the column.
Below the top level, patterns are displayed in order of decreasing frequency in the column, allowing you to choose the level of granularity for which you wish to address data issues in the column. For each pattern, you can review the counts of values matching the pattern.
In the above example, all values that have been identified as matching the
url are contained in the first category.
When you apply the transform to your recipe, the Patterns tab is updated automatically.
Tip: When you see a pattern that you wish to reuse, select the pattern and one of its suggestion cards and then modify the step.
Expand the caret next to any pattern to explore its sub-patterns, which identify subsets of values within the broader pattern.
For more information on pattern standardization, see Standardize Non-Numeric Values.
For more information on standardizing numeric values, see Normalize Numeric Values.
After patterns have been selected, they can be reused through the Transform Builder. See Pattern History Panel.
Column patterns can also be reviewed in the context panel. See Pattern Details Panel.