Through the Column Browser panel, you can use histograms, data quality bars , and data type information to perform basic review of data across many columns. You can use these tools to select data of interest for display in the Data Grid data grid or Column Details views or to prompt for suggestions of recipe steps.
When you select values in the histograms, corresponding values are highlighted in the other columns, and suggestions are updated. So, you can perform rough visual assessments of your data quality and relationships between values in your dataset and then quickly construct recipe steps to address any issues based on your selections.
Tip: Use the Column Browser to explore relationships between values across multiple columns.
- You can also use the Column Browser to toggle the display of individual columns in the the
D s webapp r true
- To open the Column Browser, click the icon Columns in the toolbar for the Transformer page.
- See Data Grid Panel.
- See Column Details Panel.
The Column Browser works with the Data Grid or the Column Details view. Below, the Column Browser is displayed with the Data Grid:
- To return to the Transformer page, click Grid in the toolbar.
Column Browser panel
- Click and drag to select a range of values in a histogram.
CTRL-click works for selecting multiple discrete values.
- If you
CTRL-click and drag, you can select a range of values. Continue holding the
CTRLbutton to select a second range of values.
- If you
When histogram values are selected, the other two panels of the Transformer page are updated.
- To explore column data in more detail, mouse over the column in the Column Browser. Then, click the Microscope icon. See Column Details Panel.
Data quality bar: The bar to the right of each histogram shows the distribution of values between valid (green), mismatched (red), and missing (black) values. Click a bar to highlight corresponding values in other histograms. This method is useful for the following:
- Identify if issues such as mismatched values in one column are common to multiple columns and may therefore indicate bad data.
- Assess correlations between specific values in one column (Rating = 10) and values in other columns (Repeat Customer = true).
Sort Columns: Select the type of sorting to perform on the columns. Columns with the highest number of unique values are listed first.
- Default: Restores default sort order of the columns in the source data and modified by the current state of the recipe.
- Name: Sort by title of the column.
- Type: Sort by data type of each column. You can change the type of a column. See Column Options below.
- Unique Values: Sort by number of unique values in each column.
- Mismatched Values: Sort columns by the number of mismatched values, which enables you to focus on the areas where the data is most problematic first.
Missing Values: Sort columns by the number of missing values.
Tip: Use the data quality bar to select the highest number of missing or mismatched values. Check the highlighting of these values in other columns. If other columns contain missing or mismatched values, you may have bad data for these rows.
- Outlier Values: Sorts by the number of values that are considered to be outliers. For more information, see Column Statistics Reference.
Edit: See Batch Column Editor below.
Sometimes, data is assigned the incorrect type on import. For example, unique identifiers may be typed as integers, when they are properly handled as strings.
To change the type for a column, click the type indicator next to the column name in the Column Browser. Select the new data type.
Column Commands: Click the drop-down to perform actions on the column. For more information on these options, see Data Grid Panel.
Show in Grid: Display the current column in the Data Grid.
Batch Column Editor
Batch Column Editor
- Enter text in the filter textbox, which matches strings anywhere in the name of a column.
- You may use the predefined filters to toggle display of the relevant columns:
- Shown/Hidden: Toggle display of visible or hidden columns.
- Data type buttons: Click the buttons associated with specific data types of interest. The count of columns of the data type is listed in the button.
For each column you can review and select the following information:
- Data Type: Click the data type icon to change the type for the column.
- Eye () icon: Toggle display of the column in the Data Grid. You can use the Shown/Hidden buttons to select or deselect these columns.
- Grid () icon: Reposition Data Grid to show selected column.
SHIFT+ click to select multiple columns.
Tip: You can drag and drop columns to re-order them. The column is repositioned in the Data Grid.
After you have selected your columns, click Edit X Selected. You can perform one of the following actions on the selected columns:
NOTE: Except as noted, these commands generate steps in your recipe.
- Send to start/end: Move the selected columns to the start or end of the dataset. This action also creates a recipe step.
- Change Type: Change the data types of the selected columns.
Show/Hide: Show or hide the selected columns in the Column Browser and Data Grid. These actions do not generate recipe steps.
Tip: If you have hidden a column from display, you must show it in this panel to begin working with it again. If there are hidden columns, the number of them is displayed at the bottom of the screen. Hidden columns are still present in any generated output.
Drop: Drop the selected columns.
- Drop others: Drop the columns that are not selected.
Enter a string in the search box to filter the list of columns immediately based on your entered string.
You can apply one or more filters to limit the set of columns displayed in the browser:
NOTE: Filters are additive.
- Filter by data type: Select one or more data types to display only those columns.
- Filter by data quality: You can filter to select only columns that contain mismatched or missing values or both.
- Filter by visibility: Select to display visible or hidden columns.
To restart your filtering, click Clear all filters.
In the browser, you can manually select one or more columns or apply one of the predefined selections.
- To select a range of columns, click a column, press
SHIFTand then click the ending column.
- To select multiple discrete columns, press
COMMANDand click additional columns.
- To toggle selection of a column, click it again.
For any individual column:
Click the Eye icon to hide/show of the column in the Transformer page.
NOTE: Hidden columns are only removed from view in the Transformer page. They still appear in any generated output.
- Click the color bars in the data quality bar to review counts.
- Right-click a column to display a list of actions in the context menu. Column actions apply only to the selected column and depend on its data type. See Column Menus.
For multiple selected columns: from the Action menu, you can choose an action from a menu of options that apply to all of the data types of the selected columns.