Based in academic research, Predictive Transformation refers to a set of design and interface principles that serve as the foundation for how interact with their data. Predictive Transformation is the linchpin of the platform. This section provides an overview of the concepts and links to locations where these concepts are surfaced in the interface.
In essence, Predictive Transformation seeks to bring closer together:
In data wrangling, the former knowledge set resides with domain experts who understand the meaning of the data, while the latter often requires involvement of IT, which may have no contextual understanding of the data to inform their solution designs.
This process of rendering data from one format into another is generally called data transformation, which breaks down into a set of programming-type tasks, with an emphasis on structure, meaning, and the statistical properties of the data. These tasks include:
Across large, distributed datasets, these tasks can be technically challenging to properly execute. To move them out of the IT domain, Predictive Transformation seeks to deliver the following capabilities:
The above cycle is repeated over and over until the set of transformations is defined and executed to satisfaction.
Based on user selection, Predictive Transformation guides you toward possible next steps yet allows you to decide the step to take and (if necessary) refine the step definition. The core of the guide/decide loop of Predictive Transformation fits into the following iterative phases. When steps are selected, visualizations are updated, and the cycle repeats again.
|Visualize||visualizations||A critical component of Predictive Transformation is the visual representation of the data, including items of interest for selection. In larger data sets, the visual cues around items of interest and the tools for interacting with them provide information on the meaning of each type of interaction and are critical for a productive and pleasant user experience.|
|Interact||selections||You interact directly with the visualizations to select values, columns, or other items of interest.|
|Predict||predictive model & suggestions||Automatically, user selections trigger queries into the predictive model. Data, metadata, and the selection of it effectively define queries of the predictive model. The model returns a set of suggested transforms. The suggestions guide you toward recommended actions on items that you has decided, through selection, are interesting. You can then decide which suggestion to undertake, including modification of the specific parameters around the suggestion. Or, you can define a completely different step to take.|
|Present||previews||Whenever the step to take is selected or subsequently modified, the anticipated results of that step are displayed as a preview overlay on top of the data. This method allows for easy development, rapid undoing, and a clearer understanding of the impacts of each step.|
In Predictive Transformation, visualizations must be carefully designed to surface selectable data or metadata of interest. In , the Transformer page has been designed to represent the underlying dataset while guiding you with selectable items.
Transformer Page contains a rich overlay of information and selection cues
Specific visualization cues:
In this manner, this visualization lifts user interaction from the domains of data and code into a more visual representation.
You must still specify via selection; the syntax of the specification is lifted into the visual domain, and the details of crafting the technical query are managed by the application.
Exploration: By design, this interaction model supports both detailed specificity and ambiguity. You selects preview the results, and then determine if the preview meets expectations. Additionally, all steps can be undone and removed from the recipe, so that you can explore different steps and entire approaches for transforming data. Solutions that demand more technical interactions often suffer from an intolerance of ambiguity, which limits your ability to express intent without significant experience and/or training. See Transformer Page.
As you review the visualization, a change in the cursor indicates the items that are available for selection.
Selection cursor changes on hover of selectable items
The following types of selections trigger the subsequent phases:
Selecting a single column in the data grid triggers a visual profile of the column data, as well as a set of suggestions. Selecting multiple columns triggers a different set of suggestions to apply across your selected columns.
Columns and values can be multi-selected.
You are still obligated to make selections in the data, thereby bringing domain-specific expertise to the problem of transforming it. This selection in turn triggers a more complex query through the application to the prediction service.
Based on the set of selections, an inference algorithm attempts to interpret the data transformation intent of the selection and generates a ranked set of suggestions and patterns for the selections to match. For example, if you select the first three characters in a cell, the algorithm may produce two transform suggestions for data removal: one to remove the rows containing the specific text and one to keep all rows containing that pattern of text in the column.
As part of the returned results of the predictive model, matching values for the selection(s) are highlighted in the table.
The predictive model interprets selection to identify intent. Possible intentions are surfaced as one or more suggested transforms in a visual manner that minimizes exposure to the transformation language.
The set of probable next steps is computed by the predictive model from user interaction, selected data, historical information, and other sources and rendered as a set of suggestions. Since these steps are essentially predictions of user intent, they are surfaced as browsable cards, through which you can explore to disambiguate the uncertainty of intention around their data selections.
Suggestion cards - selection guides suggestion
Suggestion cards are specific enough for immediate execution. You can choose to modify the transform and its parameters, if additional specification and guidance is needed.
In a suggestion card, you may see multiple variants of the selected transformation.
The first variant is the most specific one applicable to the current selection in the data grid. Mouse over the variants to see different versions of the transform. As you mouse over secondary variants, the variants typically become more specific in their changes to the dataset or rarer in their usage.
When you mouse over a different transform variant in the suggestion card, the preview popup is automatically updated to reflect the variation. When you select the variant, the Preview pane is updated. You can always modify the transform to review the detailed differences.
Optionally, you can enable the surfacing of collaborative suggestions, which aggregate the transformation steps from users in your workspace to provide an additional category of Recently used suggestion cards. As workspace members continue to transform data that is often related, the set of Recently used suggestions become more relevant to the data on which workspace users are working. This form of data-dependent predictive transformation allows to improve its understanding of the types of tasks that workspace users are trying to accomplish.
NOTE: This feature requires the machine learning service, which is enabled by default. For more information, see Miscellaneous Configuration.
Workspace administrators can choose to enable this feature and can configure whether data is aggregated from individual workspace users' transformations or from all workspace users' transformations. See Workspace Settings Page.
When this feature is enabled, collaborative suggestions appear as cards under a new Recently used category in the suggestions panel.
When the feature is enabled, Individual users can choose to opt-out of sharing their data with the feature. See User Profile Page.
When a suggestion card is selected, the results of the selected transform are previewed in the data grid, so that you can see in advance the changes to the dataset.
Previewed effects of transform
When the transform is added to the recipe, the transform is rendered into the data transformation language and applied in real-time to the dataset, so that you can immediately begin working on the next step of the process.
When a transform is selected, the selected transform and any additional guidance that you provide is translated into a specific, programmatic step in the transformation language. This step, in turn, is rendered into a complex and potentially distributed query that is applied across the dataset. In this manner, additional technical details and the knowledge required to master them are removed from user requirements.
As needed, any selection can be modified, such that you may tweak parameters to further refine intention to reach a specific outcome. In , you can click Edit to tweak individual transformations in the Transform Builder.
Modifying a transform in the Transform Builder
The actual steps of transformation are authored in . includes the following characteristics:
For a list of available transforms and functions, see Language Index.
For more information, see Wrangle Language.