Contents:
In the data grid, you can review how the current recipe applies to the individual columns in your sample.
- The grid is the default view in the Transformer page.
- To open the data grid, click the Grid View icon in the Transformer bar at the top of the page.
Figure: Data Grid Panel
Select:
- Click column headings to review suggestions for transforms to apply to the column.
Select specific values in a column for suggestions on those strings.
NOTE: Values in a cell cannot exceed 25,000 characters in length.
Tip: If you select a single value in the data grid, the suggestion cards suggest operations specific to that string. If you multi-select multiple values, the suggestions can apply any pattern shared between the values. For example, selecting "
, CA
" and ", NY
" results in suggestions for how to handle state abbreviations in a column.
- When you select a column or data in the grid, you can preview the changes in the data grid and a set of suggestion cards for possible transforms to apply to the data.
- These cards appear in the context panel on the right side of the screen.
- Suggestions are also generated when you select one or more values in the data histogram for a column or individual values in the displayed rows of the sample.
- See Suggestion Cards Panel.
Scroll:
- Use the vertical scroll bar to the right of the displayed rows of data to show other rows in the sample. To review rows of the sample data that are not displayed, you may click values in a column and then scroll down through the sampled data.
- Use horizontal scrolling to review additional columns that are off-screen.
Tip: If the contents of a cell are too large for the display, you can click the Caret ( > ) icon to the right of the cell value in the data grid to display the entire contents of the cell.
Add or Edit:
- To add a selected suggestion card to your recipe, click Add.
- To modify a suggested recipe step, select its suggestion card and click Edit. See Transform Builder.
- Click the Edit icon to toggle display of the Recipe panel, where you can review the current recipe for your dataset. See Recipe Panel.
- To add a new recipe step from scratch mouse over the Edit icon and then click the Plus icon. Search for the transformation to add. See Search Panel.
- To review details about an individual column, select Column Details from the column drop-down. See Column Details Panel.
- To review details about a selection of columns, click the Column View icon in the Transformer bar. See Column Browser Panel.
Ordering:
- You can reorder the rows based on the values in a column. From the Column menu, select Edit Column > Sort. For more information, see Column Menus.
group
parameter can result in non-deterministic re-ordering in the data grid. However, you should apply the group
parameter, particularly on larger datasets, or your job may run out of memory and fail.
To enforce row ordering, you can use the sort
transform. For more information, see Sort Transform.
Transformer Toolbar
At the top of the data grid, you can use the toolbar to quickly build common transformations, filter the display, and other operations. See Transformer Toolbar.
Status Bar
Below the data grid, you can review summary information about the data in your currently selected sample.
Figure: Sample Status Bar
Click the Eye icon to open the Visible Columns panel, where you can toggle the display of individual columns. For more information, see Visible Columns Panel.
The status bar contains metrics about the current dataset sample for the currently selected recipe step.
- For example, if your first recipe step removes 100 rows of data, when you create your next recipe step, the status bar should indicate a row count that is 100 less than the row count at the start of the recipe. The other counts may be affected as well.
- The number of columns reflects the count that is currently displayed in the data grid. Toggling visibility of columns or applying column-based filters changes this value.
Tip: Before you begin transforming your data, you might want to verify the columns and count of data types against the data before it was imported. If there are discrepancies, you might want to investigate the differences. Unless your sample includes the entire dataset, row counts should differ.
NOTE: In the Photon running environment, results can differ between executions of the same recipe due to Photon's parallel execution and data limiting within the Transformer page. In particular, joins with multiple matches per key can sometimes cause a difference in the number of reported rows when the job is re-executed.
Show only affected:
When transform steps are previewed, you can use these checkboxes to display only the previewed changes for affected rows, columns, or both.
Tip: These options assist in narrowing the data grid display to only the steps affected by the current recipe step.
Find Column
In a wide dataset, click the Find icon in the Transformer toolbar to locate the column of interest.
Figure: Find column search bar
- Use the up and down arrows to view the list of the columns in the dataset.
- You can start typing a column name to filter the list.
NOTE: An imported dataset requires about 15 rows to properly infer column data types and the row, if any, to use for column headers.
Column Information
Figure: Column header, data quality bar, and histogram
At the top of the column, you can review:
NOTE: In the column header, counts reflect only the counts in the currently loaded sample. They do not reflect counts across the entire dataset, unless the entire dataset is the sample.
Item | Description |
---|---|
Data type | Identifies the selected data type, which can be inferred by the application based on the contents of the column. Click the icon to change the data type. Tip: Before you start performing transformations on your data based on mismatched values, you should check the data type for these columns to ensure that they are correct. For more information, see Supported Data Types. See Supported Data Types. |
Column name | To change the column name, select Rename... from the column menu. |
Column menu | Depending on the column data type, you can select from a set of predefined recipe steps in the column menu under the caret on the right side of the menu. See Column Menus. |
Data quality bar | The horizontal line shows valid, missing, and mismatched values in the column compared to the column's data type. Tip: You can click these colored bars to generate suggestion cards for transforms to act on these types of values. See Data Quality Bars. |
Column histogram | For each column, you can see the range and frequency of values in the column. Tip: You can select one or more values a histogram to generate suggestion cards. See Column Histograms. |
Selecting values
You can click and drag to select values in a column:
- Select a single value in the column to prompt a set of suggestions.
- Select multiple values in a single column to receive a different set of suggestions.
- See Suggestion Cards Panel.
- Double-click to select an individual word, and triple-click to select an entire cell value.
- When you select values, some values in other columns may be highlighted in a darker color, which provides some indication of correlation between values.
Row Information
On the left side of the screen, you can see a column of black dots. If you hover over one of these, you can see the current row number and, if the information is still available, the row number for the row from the original source data. These values apply only to the sample in the current dataset.
Tip: To review the original row number for a row, hover over the black dot in the data grid. These values can be referenced using the SOURCEROWNUMBER
function in your recipe steps. Some transform steps, such as pivot
and union
, may make the original row information invalid or otherwise unavailable, which disables this option. See SOURCEROWNUMBER Function.
Filter Data Grid
From the Filters drop-down, you can define filters to apply to columns, rows, or both in the data grid. See Filter Panel.
Transform Preview
Before a transform in development has been added to the recipe, a preview of the results is generated in the data grid. See Transform Preview.
Target Matching Bar
When a target has been assigned to your recipe, you can review the expected names and data types that are expected for the target in the Target Matching bar above the column histograms.
- You can assign a dataset to be the target for the recipe you are constructing. This imported dataset, reference dataset, or recipe output contains the set of columns to which you are targeting your wrangling activities. When a target has been assigned, it is displayed in the data grid and column browser to assist you in defining your wrangling steps to match the target.
- For more information, see Overview of Target Matching.
Figure: Target Matching Bar
In the Target Matching bar, you can review how the target above matches the current recipe below. For each column, matching assesses:
- Current column name vs. target column name
- Current column data type vs. target column data type
- Current column position vs. target column position
Actions:
- If you hover over different schema tags, you can review the detected differences between the target and the current column.
- Click the schema tag to add a recipe step or steps at the current location to create a match between the two columns.
For more information on the schema tags, see Column Browser Panel.
This page has no comments.