In the data grid, you can review how the current recipe applies to the individual columns in your sample.
- The grid is the default view in the Transformer page.
- To open the data grid, click the Grid View icon in the Transformer bar at the top of the page.
- Click column headings to review suggestions for transforms to apply to the column.
- Select another column heading to review a different set of suggestions to apply to your selected columns.
Select specific values in a column for suggestions on those strings.
NOTE: Values in a cell cannot exceed 25,000 characters in length.
Tip: If you select a single value in the data grid, the suggestion cards suggest operations specific to that string. If you multi-select multiple values, the suggestions can apply any pattern shared between the values. For example, selecting "
, CA" and "
, NY" results in suggestions for how to handle state abbreviations in a column.
- When you select a column or data in the grid, you can preview the changes in the data grid and a set of suggestion cards for possible transforms to apply to the data.
These cards appear in the context panel on the right side of the screen.
Tip: Keep selecting columns to be prompted by different suggestions that apply to multiple columns.
- Suggestions are also generated when you select one or more values in the data histogram for a column or individual values in the displayed rows of the sample.
- See Suggestion Cards Panel.
- Use the vertical scroll bar to the right of the displayed rows of data to show other rows in the sample. To review rows of the sample data that are not displayed, you may click values in a column and then scroll down through the sampled data.
- Use horizontal scrolling to review additional columns that are off-screen.
Tip: If the contents of a cell are too large for the display, you can click the Caret ( > ) icon to the right of the cell value in the data grid to display the entire contents of the cell.
Add or Edit:
- To add a selected suggestion card to your recipe, click Add.
- To modify a suggested recipe step, select its suggestion card and click Edit. See Transform Builder.
- To review details about an individual column, select Column Details from the column drop-down. See Column Details Panel.
- To review details about a selection of columns, click the Column View icon in the Transformer bar. See Column Browser Panel.
- You can reorder the rows based on the values in a column. From the Column menu, select Edit Column > Sort. For more information, see Column Menus.
groupparameter can result in non-deterministic re-ordering in the data grid. However, if you're running your job on the Spark running environment, you should apply the
groupparameter, or your job may run out of memory and fail. To avoid this issue and to enforce row ordering, use the
sorttransform. For more information, see Sort Transform.
At the top of the data grid, you can use the toolbar to quickly build common transformations, filter the display, and other operations. See Transformer Toolbar.
Below the data grid, you can review summary information about the data in your currently selected sample.
Click the Eye icon to open the Visible Columns panel, where you can toggle the display of individual columns. For more information, see Visible Columns Panel.
The status bar contains metrics about the current dataset sample for the currently selected recipe step.
- For example, if your first recipe step removes 100 rows of data, when you create your next recipe step, the status bar should indicate a row count that is 100 less than the row count at the start of the recipe. The other counts may be affected as well.
- The number of columns reflects the count that is currently displayed in the data grid. Toggling visibility of columns or applying column-based filters changes this value.
Tip: Before you begin transforming your data, you might want to verify the columns and count of data types against the data before it was imported. If there are discrepancies, you might want to investigate the differences. Unless your sample includes the entire dataset, row counts should differ.
Show only affected:
When transform steps are previewed, you can use these checkboxes to display only the previewed changes for affected rows, columns, or both.
Tip: These options assist in narrowing the data grid display to only the steps affected by the current recipe step.
In a wide dataset, click the Find icon in the Transformer toolbar to locate the column of interest.
- Use the up and down arrows to view the list of the columns in the dataset.
- You can start typing a column name to filter the list.
At the top of the column, you can review:
NOTE: In the column header, counts reflect only the counts in the currently loaded sample. They do not reflect counts across the entire dataset, unless the entire dataset is the sample.
Identifies the selected data type, which can be inferred by the application based on the contents of the column. Click the icon to change the data type.
Tip: Before you start performing transformations on your data based on mismatched values, you should check the data type for these columns to ensure that they are correct. For more information, see Supported Data Types.
See Supported Data Types.
|Column name||To change the column name, select Rename... from the column menu.|
|Column menu||Depending on the column data type, you can select from a set of predefined recipe steps in the column menu under the caret on the right side of the menu. See Column Menus.|
|Data quality bar|
The horizontal line shows valid, missing, and mismatched values in the column compared to the column's data type.
Tip: You can click these colored bars to generate suggestion cards for transforms to act on these types of values.
See Data Quality Bars.
For each column, you can see the range and frequency of values in the column.
Tip: You can select one or more values a histogram to generate suggestion cards.
See Column Histograms.
You can click and drag to select values in a column:
- Select a single value in the column to prompt a set of suggestions.
- Select multiple values in a single column to receive a different set of suggestions.
- See Suggestion Cards Panel.
- Double-click to select an individual word, and triple-click to select an entire cell value.
- When you select values, some values in other columns may be highlighted in a darker color, which provides some indication of correlation between values.
On the left side of the screen, you can see a column of black dots. If you hover over one of these, you can see the current row number and, if the information is still available, the row number for the row from the original source data. These values apply only to the sample in the current dataset.
Tip: To review the original row number for a row, hover over the black dot in the data grid. These values can be referenced using the
SOURCEROWNUMBER function in your recipe steps. Some transform steps, such as
union, may make the original row information invalid or otherwise unavailable, which disables this option. See SOURCEROWNUMBER Function.
Filter Data Grid
From the Filters drop-down, you can define filters to apply to columns, rows, or both in the data grid. See Filter Panel.
Before a transform in development has been added to the recipe, a preview of the results is generated in the data grid. See Transform Preview.
Target Matching Bar
When a target has been assigned to your recipe, you can review the expected names and data types that are expected for the target in the Target Matching bar above the column histograms.
- You can assign a dataset to be the target for the recipe you are constructing. This imported dataset, reference dataset, or recipe output contains the set of columns to which you are targeting your wrangling activities. When a target has been assigned, it is displayed in the data grid and column browser to assist you in defining your wrangling steps to match the target.
- For more information, see Overview of Target Matching.
In the Target Matching bar, you can review how the target above matches the current recipe below. For each column, matching assesses:
- Current column name vs. target column name
- Current column data type vs. target column data type
- Current column position vs. target column position
- If you hover over different schema tags, you can review the detected differences between the target and the current column.
- Click the schema tag to add a recipe step or steps at the current location to create a match between the two columns.
For more information on the schema tags, see Column Browser Panel.
This page has no comments.