Page tree

 

Support | BlogContact Us | 844.332.2821

 

Contents:

This documentation applies to Trifacta Wrangler. Download this free product.
Registered users of this product or Trifacta Wrangler Enterprise should login to Product Docs through the application.

Contents:


Based in academic research, Predictive Transformation refers to a set of design and interface principles that serve as the foundation for how Trifacta® users interact with their data. Predictive Transformation is the linchpin of the platform. This section provides an overview of the concepts and links to locations within the Trifacta Application where these concepts are surfaced in the interface.

Overview

In essence, Predictive Transformation seeks to bring closer together:

  1. the domain knowledge about the data, and 
  2. the technical knowledge of the sometimes complex operations required to render data into its final usable format.  

In data wrangling, the former knowledge set resides with domain experts who understand the meaning of the data, while the latter often requires involvement of IT, which may have no contextual understanding of the data to inform their solution designs. 

This process of rendering data from one format into another is generally called data transformation, which breaks down into a set of programming-type tasks, with an emphasis on structure, meaning, and the statistical properties of the data. These tasks include:

  • statistical manipulation (profiling, outliers, imputation)
  • restructuring (data extraction, nesting, pivot/unpivot)
  • cleaning (standardization, deduplication, data removal)
  • enrichment (join with other data, lookups of reference data)
  • distillation (sampling, filtering, aggregation, windowing) 

Across large, distributed datasets, these tasks can be technically challenging to properly execute. To move them out of the IT domain, Predictive Transformation seeks to deliver the following capabilities:

  1. Features & Visualizations - innovative methods to display and select data of interest to the user 
  2. Suggestions - based on user selection, suggested transforms are presented to the user for selection and configuration
  3. Previews - for the selected suggestion, previews of the anticipated change are available for review prior to inclusion in the transformations on the dataset

The above cycle is repeated over and over until the set of transformations is defined and executed to satisfaction.

Phases

Based on user selection, Predictive Transformation guides the user toward possible next steps yet allows the user to decide the step to take and (if necessary) refine the step definition. The core of the guide/decide loop of Predictive Transformation fits into the following iterative phases. When steps are selected, visualizations are updated, and the cycle repeats again.

PhaseUI ElementDescription
VisualizevisualizationsA critical component of Predictive Transformation is the visual representation of the data, including items of interest for selection. In larger data sets, the visual cues around items of interest and the tools for interacting with them provide information on the meaning of each type of interaction and are critical for a productive and pleasant user experience.
InteractselectionsUsers interact directly with the visualizations to select values, columns, or other items of interest.
Predictpredictive model & suggestionsAutomatically, user selections trigger queries into the predictive model. Data, metadata, and the selection of it effectively define queries of the predictive model. The model returns a set of suggested transforms. The suggestions guide the user toward recommended actions on items that the user has decided, through selection, are interesting. The user can then decide which suggestion to undertake, including modification of the specific parameters around the suggestion. Or, the user can define a completely different step to take.
PresentpreviewsWhenever the step to take is selected or subsequently modified, the anticipated results of that step are displayed as a preview overlay on top of the data. This method allows for easy development, rapid undoing, and a clearer understanding of the impacts of each step.

Visualizations

In Predictive Transformation, visualizations must be carefully designed to surface selectable data or metadata of interest to the user. In Trifacta Wrangler, the Transformer page has been designed to represent the underlying dataset while guiding the user with selectable items.

Figure: Transformer Page contains a rich overlay of information and selection cues

Specific visualization cues:

  1. Data rendered into familiar grid format, regardless of underlying structure
    1. selectable values and columns
  2. Color-coded data quality bars:
    1. green: valid
    2. black: missing
    3. red: invalid (checked against data type)
    4. Select a color to select all corresponding values
  3. Histograms for individual columns:
    1. Select one or more values in the histogram highlights corresponding values in other column histograms for easy visual comparisons
  4. Metadata on entire dataset and type and statistical information for individual columns. See Column Details Panel

In this manner, this visualization lifts the user's interaction from the domains of data and code into a more visual representation.

Users must still specify via selection; the syntax of the specification is lifted into the visual domain, and the details of crafting the technical query are managed by the application.

Exploration: By design, this interaction model supports both detailed specificity and ambiguity. The user selects, previews the results, and then determines if the preview meets expectations. Additionally, all steps can be undone and removed from the recipe, so that users can explore different steps and entire approaches for transforming data. Solutions that demand more technical interactions from the user often suffer from an intolerance of ambiguity, which limits a user's ability to express intent without significant experience and/or training.

Selections

As the user reviews the visualization, a change in the cursor indicates the items that are available for selection. 

Figure: Selection cursor changes on hover of selectable items

The following types of selections trigger the subsequent phases:

  • cell values and values within a cell
  • columns
  • values in a data histogram
  • categories of values (valid, invalid, missing) within a data quality bar

All values can be multi-selected. 

The user is still obligated to make selections in the data, thereby bringing domain-specific expertise to the problem of transforming it. This selection in turn triggers a more complex query through the application to the prediction service. 

Predictive Model

Based on the set of selections, an inference algorithm attempts to interpret the data transformation intent of the selection and generates a ranked set of suggestions and patterns for the selections to match. For example, if you select the first three characters in a cell, the algorithm may produce two transform suggestions for data removal: one to remove the rows containing the specific text and one to keep all rows containing that pattern of text in the column.

  • As part of the returned results of the predictive model, matching values for the selection(s) are highlighted in the table.

The predictive model interprets selection to identify intent. Possible intentions are surfaced as one or more suggested transforms in a visual manner that minimizes exposure to the transformation language.

Suggestions and Their Variants

The set of probable next steps is computed by the predictive model from the user interaction, selected data, historical information, and other sources and rendered as a set of suggestions. Since these steps are essentially predictions of user intent, they are surfaced as browsable cards, through which the user can explore to disambiguate the uncertainty of intention around their data selections. 

Figure: Transform cards - selection guides suggestion

Transform cards are specific enough for immediate execution. The user can choose to modify the transform and its parameters, if additional specification and guidance is needed.

At the bottom of each transform card, you can see one or more dots. Each dot represents a variant of the selected transformation.

The first variant is the most specific one applicable to the current selection in the data grid. Mouse over the variants to see different versions of the transform. As you mouse over variants further to the right in the transform card, the variants typically become more specific in their changes to the dataset or rarer in their usage.

When you mouse over a different transform variant in the transform card, the card is automatically updated to reflect the variation. When you select the variant, the Preview is updated. You can always modify the transform to review the detailed differences.

Previews

When a transform card is selected, the results of the selected transform are previewed in the data grid, so that the user can see in advance the changes to the dataset. 

Figure: Previewed effects of transform

When the transform is added to the recipe, the transform is rendered into the data transformation language and applied in real-time to the dataset, so that the user can immediately begin working on the next step of the process.

When a transform is selected, the selected transform and any additional guidance from the user is translated into a specific, programmatic step in the transformation language. This step, in turn, is rendered into a complex and potentially distributed query that is applied across the dataset. In this manner, additional technical details and the knowledge required to master them are removed from from the user's requirements.

Additional Steps - Modification

Modification via Transform Builder

As needed, any selection can be modified, such that the user may tweak parameters to further refine intention to reach a specific outcome. In the Trifacta Application, users can click Modify to tweak individual transforms in the Transform Builder.

Figure: Modifying a transform in the Transform Builder

Modification via Transform Editor

The user may reach a level of complexity in their interactions that the interface cannot directly support without adding unnecessary complexity to the majority workflows. In these cases, the user can be guided into the underlying language, where queries can be modified to address the requirements of more complex interactions. For finer-grain control, you can switch to the Transform Editor, which enables access to the exact text of the transform step:

Figure: Transform Editor and Recipe Panel

The Transform Editor supports type-ahead modification and creation of transform steps, and content can be copied and pasted as regular text. All transforms are auto-previewed in the data grid. See Transform Editor Panel.

As needed, users can interact directly with the set of transforms for the dataset. Here again, in the Recipe panel on the right side of the screen, users can interact and manipulate individual steps. Users can select any step in the Recipe panel and re-order it, remove it, or resume editing in the Transform Editor. See Recipe Panel.

Wrangle

The actual steps of transformation are authored in Wrangle (a domain-specific language for data transformation). Wrangle includes the following characteristics:

  • Single-source transformations, with results rendered without modification to the original source data
  • General cleaning and transformation operations on numerical and textual data of varying and custom data types
  • Structural transformations for managing nested data like JSON and XML
  • Multi-dataset transformations such as lookups, joins, and unions
  • Transformation of data to metadata, such as pivot and unpivot operations
  • text selection patterns, including regular expressions, as a macro-type set of references. See Text Matching.

For a list of available transforms and functions, see Language Index.

For more information, see Wrangle Language


Your Rating: Results: PatheticBadOKGoodOutstanding! 2 rates

This page has no comments.