Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

In the Column Details panel, you can review additional details about a column of your dataset. Select Select Column Details from from any column menu or the Action menu in the column browser.

...

D caption
typefigure
Column Details panel - Overview tab

Column statistics:

You can use this view to review basic counts and percentages of the values in the currently selected column. In addition to basic computations on valid, mismatched, and missing values, you can see breakdowns for the most frequent values and outlier values. 

D s profilingstats

Depending on the data type of the column, additional statistics provide information on data quality and variation.  For For more information, see see Column Statistics Reference.

...

  • To change the data type, click the type indicator next to the column title in the Column Details panel.
  • To perform commands on the column, select from the drop-down next to the column title. For more information, see see Column Menus.
  • Use the data quality bar to select categories of values: valid, mismatched, or missing. The context panel is updated is updated based on your selection with recommended recipe steps. See See Suggestion Cards Panel.

Patterns tab

In the Patterns tab, you can review patterns identified by the platform in the selected column's data and then create steps based on patterns that you select.  Pattern Pattern profiling automatically finds and groups clusters of the column's values based on similarities in format and structure, such as differently formatted phone numbers, addresses, log entries, and name fields. For example, if some of your dataset's address values include apartment numbers, you can create a split transform transform based on a pattern that includes the apartment numbers.

...

  • Each non-blank value in the column is represented by one of the displayed patterns. Patterns are specified as a combination of literal values and and
    D s item
    itempatterns
    rtrue
    . For more information on these patterns, see see Text Matching.
  • Patterns might be more generalized than the constraints of the column's data type. 
  • Token values are are
    D s item
    itempatterns
     without without braces. 

D caption
typefigure
Column Details panel - Patterns tab

All non-blank values are captured in the the all patterns category category, which you can expand to display the patterns that capture subsets of all values.  Patterns Patterns are displayed in a tree structure, with each lower level describing a subset of the parent pattern. 

Tip

Tip: Hover over a pattern or sub-pattern to see the affected values in the example data beneath it.

...

Below the top level, patterns are displayed in order of decreasing frequency in the column, allowing you to choose the level of granularity for which you wish to address data issues in the column.  For For each pattern, you can review the counts of values matching the pattern.

In the above example, all values that have been identified as matching the the url 

D s item
itempattern
 are are contained in the first category.

  • Select a pattern to trigger a set of suggestion cards to apply to the represented data. 
    • When you select values from a pattern's histogram, all suggestions match the pattern. You cannot select the values that do not match the pattern from the histogram.
    • For more information, see see Explore Suggestions.
  • Select a token within a pattern or a highlighted block of text among the example values to trigger suggestion cards that apply the token within the pattern.
  • You can modify the selected suggestion in the Transform Builder. See See Transform Builder.
    • When you apply the transform to your recipe, the Patterns tab is updated automatically.

      Tip

      Tip: When you see a pattern that you wish to reuse, select the pattern and one of its suggestion cards and then modify the step.

  • Expand the caret next to any pattern to explore its sub-patterns, which identify subsets of values within the broader pattern.

    Info

    NOTE: The Other pattern is a special category that contains values and counts not recognized by the currently selected pattern or sub-pattern. For example, when you select url pattern, the Other pattern captures the non-URL values. When you explore a sub-pattern of URLs, the Other category captures the values not recognized within the sub-pattern.

...

After patterns have been selected, they can be reused through the Transform Builder. See See Pattern History Panel.

Column patterns can also be reviewed in the context panel. See See Pattern Details Panel.