A Quick Tour
The Transformer page experience is based on the design principles of predictive transformation. Predictive transformation allows you as the user to explore your data and guides you to possible next steps based on your explorations. Predictive transformation works in the following phases:
- Visualize: Display your sampled data in tabular format with visualizations to indicate areas of interest, such as missing or invalid data or outlier values.
- Interact: You can directly interact with the data to select things of interest to you. For example, you can click a black bar in a column visualization to select all of the missing values in the column.
- Predict: Based on your selection(s), the predictive transformation model offers a set of suggested transformations. You can select one from the list. A suggestion may have multiple variations for further refinement. If you want to take it a step further, you can edit the suggested transformation directly to fine-tune the step.
- Present: Each time you make a selection in the data grid or among the suggestion cards, the effects of the selected transformation are previewed in the data grid.
If all looks good, you add the transformation. The change is added to your recipe, which is the sequence of steps to transform your data.
Then, you repeat the above steps for the next data of interest. You can see this process in action below.
1 - Visualize
Below, you can see the Transformer page displaying a sample from the USDA-recognized Farmer's Markets in the United States as of the year 2014.
The source of this data is an Avro file, which in its raw form does not look like this nice tabular representation. When a dataset is loaded into the Transformer page, the Designer Cloud application attempts to represent it in tabular form, so that it is easy to navigate and analyze.
A few key things to notice:
- Initial Data: This button indicates the current sample. Unless your dataset is small, the Transformer page displays in its data grid a sample taken from the first set of rows in the first file in your dataset. Be sure to read the section on Sampling below.
- The Initial Data button is part of the menu bar, which provide access to key application features and capabilities. The menu bar is described later.
- Below the menu bar is the toolbar, which contains a set of tools that you can select to apply to your data. The toolbar is described later.
- The recipe is empty: This area is the recipe panel, where you can see the steps that you have added to your recipe. A recipe is the set of steps that you create to transform your data.
- Primary_Website_or_URL: Above the data in each column you can see two bars.
- Histogram: The lower bar is a histogram of values in the column.
- Data quality bar: Above the histogram is the data quality bar, which displays the comparative percentage of valid (green), invalid (red), and missing (black) values in a column, as compared to the data type of the column.
2 - Interact
The data grid is not just for display; you can interact with it to select data elements of interest to you. For example, in the Farmer's Market dataset, you may have noticed that the
Primary_Website_or_URL column has a large black bar, which indicates many missing values in the column. A reasonable interpretation of these missing values is that these Farmer's Markets do not have a known web site. To explore these markets, you click the black bar:
A couple of things to notice:
- The right panel now displays a list of suggestions. These suggestions correspond to transformations that you can apply to the current selection. In this case, the selected data are the missing values in the
- The first suggestion is selected by default. In the data grid, you can see a preview of what would happen if this step was added to the recipe. The highlighted rows would be removed.
Things you can select:
- Column(s): Click a column header. Use
COMMANDto select multiple columns.
- Row(s): To select a row, click the dot on the left side of the data grid.
- Cell values: Click and drag to select part of a cell value. Double-click to select the whole cell value.
- Histogram values:You can select one or more values in the histogram. Based on your selection(s), a set of suggested changes is presented to you. These changes apply to the rows where the selected values occur.
- Data quality bar: You can select bars in the data quality bar.
3 - Predict
In the above example, predictive transformation predicted that based on your selection, you wanted to delete the rows where the website URL column was empty. This prediction is based on multiple factors, including:
- Platform algorithms that interpret the meaning of user selections
- Tracking of previous user interactions with your project or workspace of a similar nature.
So, the prediction makes sense; in most cases, if data is missing, you don't want the rest of the incomplete data in the row. However, suppose your interest in the data is different. Suppose that you are a website builder. You want to find the farmer's markets that don't have a website as potential customers. In this case, the second suggested transformation makes more sense: Keep these rows with missing values:
When the second suggestion is selected, the affected rows are highlighted in green, indicating that they will be retained.
Tip: To visualize assist in reviewing these rows, click the Show only affected checkbox at the bottom of the screen. Only the rows are displayed where the suggested transformation is to be applied.
4 - Present
If the selected suggestion looks good, click Add:
The transformation has been added to your recipe. Note at there is now a step in the recipe panel at right, which was previously empty. You've added a recipe step, and your sampled data is transformed.
The above represents the basic cycle of using the Transformer page:
- Display & Interact: Locate data of interest and select it.
- Predict: Review the suggested transformations.
- Click them to preview the results.
- You can also click Edit to make modifications to the suggested transformation before you add it.
- Present: Add the selected and modified suggestion to your recipe. The sample of data in the data grid is transformed.
Repeat the above steps.
Explore the various tools. Take chances. Make mistakes. You can always edit, disable, or delete steps that you have added through the Recipe panel.
Other Useful Tools
The above predictive transformation cycle presents the simplest and most visual method of transforming your data in Designer Cloud. There are other ways to begin building a transformation.
The tools in the toolbar represent shortcuts to application views, transformations, or visual filters to be applied to the data grid.
Each column header has a drop-down menu that is full of context-relevant features.
- Column Details opens a different interface for exploring and transforming data based on the values and patterns of values within the column.
- Standardize enables you to standardize similar values in the column by grouping them together.
Since datasets can be very wide and long, you may need to explore how to visually filter the columns and rows in your dataset.
Tip: These tools hide from display rows and columns. Data is not removed. If you use them regularly, remember to un-hide columns and deselect filters to verify that your data transformations are complete.
- Column View on the left side of the toolbar allows you to review the columns and their histograms in a simpler interface. Use the context menu to hide the display of a column.
- Find column on the right side of the toolbar searches the dataset for a column by name.
- Filters on the right side of the toolbar allows you to design column- or row-based visual filters to narrow the scope of what is displayed in the data grid.
A Word about Sampling
If your dataset is small enough, the data grid displays the entire dataset. In most cases, the data grid displays a sample of your dataset for performance reasons.
When a dataset is first loaded in the Transformer page, the Initial Data sample is displayed. This sample is the first set of rows from the first file or table in your dataset, up to the sample limit. This sample is downloaded into your browser.
As needed, you may need to take new samples from your dataset, for the following reasons:
- Your entire dataset may contain meaningful data that isn't represented in the Initial Data sample.
- The steps that you add to your recipe are applied to the sample displayed in your browser. In this manner, you quickly see the effects of your transformations. However, the number and complexity of those transformations, which must be updated and re-applied when you add or modify transformations, can begin to impact performance.
This page has no comments.