In the Data Griddata grid, you can review how the current recipe applies to the individual columns in your sample.
- The grid is
- the default view in the Transformer page.
- To open the data grid, click the Grid View icon in the Transformer bar at the top of the page.
Data Grid Panel
- Click column headings to review suggestions for transforms to apply to the column.
Select specific values in a column for suggestions on those strings.
NOTE: Values in a cell cannot exceed 25,000 characters in length.
Tip: If you select a single value in the data grid, the transform suggestion cards suggest operations specific to that string. If you multi-select multiple values, the suggestions can apply any pattern shared between the values. For example, selecting "
, CA" and "
, NY" results in suggestions for how to handle state abbreviations in a column.
Data Grid Panel
- When you select a column or data in the grid, you can preview the changes in the data grid and a set of transform cards or a suggested recipe step to edit. These appear at the bottom of the pagesuggestion cards for possible transforms to apply to the data.
- These cards appear in the context panel on the right side of the screen.
- Suggestions are also generated when you select one or more values in the data histogram for a column or individual values in the displayed rows of the sample.
- See Transform Suggestion Cards Panel.
- Use the vertical scroll bar to the right of the displayed rows of data to show other rows in the sample. To review rows of the sample data that are not displayed, you may click values in a column and then scroll down through the sampled data.
- Use horizontal scrolling to review additional columns that are off-screen.
Tip: If the contents of a cell are too large for the display, you can click the Caret ( > ) icon to the right of the cell value in the data grid to display the entire contents of the cell.
Add or Edit:
- To add a selected suggestion card to your recipe, click Add.
- To modify a suggested recipe step, select its suggestion card and click Edit. See Transform Builder.
- Click the Edit icon to toggle display of the Recipe panel, where you can review the current recipe for your dataset. See Recipe Panel.
- You can modify To add a new recipe step selected in the panelfrom scratch mouse over the Edit icon and then click the Plus icon. See Transform Builder.
- To review details about an individual column, select Column Details from the column drop-down. See Column Details Panel.
- To review details about a selection of columns, open the left panelclick the Column View icon in the Transformer bar. See Column Browser Panel.
Above Below the Data Griddata grid, you can review summary information about the data in your currently selected sample.
Click the Eye icon to open the Visible Columns panel, where you can toggle the display of individual columns. For more information, see Visible Columns Panel.
The status bar contains metrics about the current dataset sample for the currently selected recipe step.
- For example, if your first recipe step removes 100 rows of data, when you create your next recipe step, the status bar should indicate a row count that is 100 less than the row count at the start of the recipe. The other counts may be affected as well.
- The number of columns reflects the count that is currently displayed in the data grid. Toggling visibility of columns or applying column-based filters changes this value.
Tip: Before you begin transforming your data, you might want to verify the columns and count of data types against the data before it was imported. If there are discrepancies, you might want to investigate the differences. Unless your sample includes the entire dataset, row counts should differ.
NOTE: In the Photon running environment, results can differ between executions of the same recipe due to Photon's parallel execution and data limiting within the Transformer page. In particular, joins with multiple matches per key can sometimes cause a difference in the number of reported rows when the job is re-executed.
- To review and analyze across multiple columns, click Columns. See Column Browser Panel.
- To return to the data grid, click Grid.
- To review the samples for the loaded dataset, click the dataset link. See Sampling Menu.
Click the data types link to review the data type for each column. See Supported Data Types.
Filter: Enter text to begin filtering the rows for display in the data grid.
Tip: When you are previewing a recipe step, click the Transformed link to display in the data grid only the rows that have been modified by the currently configured recipe.
Data Grid Options
Use the right-hand menu to toggle display of various features in the Transformer page:
- Preview: Toggle display of previews of changes. For large datasets containing dozens of columns, disabling previews can improve performance.
Show Whitespace: When enabled, non-visible characters such as spaces, new lines, and tabs appear in the data as pink icons.
NOTE: If you are using imported datasets with fixed-width columns, you should disable the Show Whitespace option. These columns can conflict with the platform's ability to compute a cell's whitespace, which can impact performance and can cause the browser to crash.
- Card Suggestions: Toggle display of the transform cards, which can be used for quickly applying transform steps on individual columns. Cards are only displayed if a suggestion for your current selection is available.
Script: Toggle display of recipe steps in either natural, readable language or in native
D s lang
D s lang full true
Show only affected:
When transform steps are previewed, you can use these checkboxes to display only the previewed changes for affected rows, columns, or both.
Tip: These options assist in narrowing the data grid display to only the steps affected by the current recipe step.
In a wide dataset, it can be easier to use the Find Column bar to locate the column of interest.
Find column search bar
- Use the up and down arrows to view the list of the columns in the dataset.
- You can start typing a column name to filter the list.
Column header, data quality bar, and histogram
At the top of the column, you can review:
- the data type and name of the column
- the data quality bar
- the data histogram that summarizes the range of values in the sampled column.
Tip: These values represent counts of the displayed sample.
Column header and histogram
Tip: If you hover over a bar in the histogram, you can review specific values, the count of that value, and the percentage that value represents of the total count of values in the column.
NOTE: In the column header, counts reflect only the counts in the currently loaded sample. They do not reflect counts across the entire dataset, unless the entire dataset is the sample.
Identifies the selected data type, which can be inferred by the application based on the contents of the column. Click the icon to change the data type.
See Supported Data Types.
|Column name||To change the column name, select Rename... from the column menu.|
|Column menu||Depending on the column data type, you can select from a set of predefined recipe steps in the column menu under the caret on the right side of the menu. See Column Menus.|
|Data quality bar|
The horizontal line shows valid, missing, and mismatched values in the column compared to the column's data type.
See Data Quality Bars.
For each column, you can see the range and frequency of values in the column.
See Column Histograms.
You can click and drag to select values in a column:
- Select a single value in the column to prompt a set of suggestions at the bottom of the screen.
- Select multiple values in a single cell to receive a different set of suggestions.
- You can select individual values in the column histogram or categories of values in the data quality bar, prompting a set of suggestions.
- See Transform See Suggestion Cards Panel.
- Double-click performs to select an individual word selection, and triple-click selects to select an entire cell value.
- When you select values, some values in other columns are may be highlighted in a darker color, which provides some indication of correlation between values.
Data quality bar
Just below the column name is a horizontal band, which identifies data quality issues among the sample values in the column. Each color band identifies the relative number of records that fit the following data quality definitions:
You can use a column's data quality bar to build a recipe step to address selected data. For example, click the red set of values in the data quality bar to generate a set of transform cards or the beginning of a step in the Transform Editor to address mismatched values in the column.
Tip: The histogram may also show you unwanted variation in your values. For example, if the column stores latitude data, the precision may be too fine (e.g.
Below the data quality bar, a column histogram displays the count of each detected value in the column (for string data) or the count of values within a numeric range (for number data). You can use this histogram to identify unusual values or outlier values, which should be removed or correct.
- Select one or more values in the histogram to prompt a set of suggestions for addressing the data.
CTRL- click to select multiple values.
Individual bars in a histogram for a numeric column often represent a range of values. Some notes:
- For a numeric range bar that overlaps values in another bar, values are inclusive on the lower bound and exclusive on the upper bound. For example, if a histogram bar represents the values 0-10, it includes the count of instances of 0 and does not include the count of instances of 10. The count of instances of 10 is part of the adjacent bar in the histogram.
- The above applies only when there are overlapping values between data ranges. If there are no overlapping values, then the range includes the values of the lower and upper boundaries.
Tip: When you resize the width of a column, the number of bars displayed in the column histogram changes accordingly. You can use this dynamic resizing to change the granularity displayed in histograms.
On the left side of the screen, you can see a column of black dots. If you hover over one of these, you can see the current row number and, if the information is still available, the row number for the row from the original source data. These values apply only to the sample in the current dataset.
Tip: Toreview the original row number for a row, hover over the black dot in the data grid. These values can be referenced using the
Filter Data Grid
From the Filters drop-down, you can define filters to apply to columns, rows, or both in the data grid. See Filter Panel.
Before a transform in development has been added to the recipe, a preview of the results is generated in the data grid. See Transform Preview.