Page tree



Contents:

The cloud-based version of Trifacta Wrangler is now available! Read all about it, and register for your free account.

Terminology applicable to Trifacta® Wrangler.

NOTE: This list is not comprehensive.

 Contents...

Contents:


Object Model Terms

These terms apply to the objects that you import, create, and generate in Trifacta Wrangler.

flow

A container for holding a set of related imported datasets, recipes, and output objects. Flows are managed in Flow View page.

imported dataset

A reference to an object that contains data to be wrangled in Trifacta Wrangler. An imported dataset is created when you specify the file(s) or table(s) that you wish to read through a connection. 

job

A job is the sequence of processing steps that apply each step of your recipe in sequence across the entire dataset to generate the desired set of results.

macro

A macro is a sequence of one or more reusable recipe steps. Macros can be configured to accept parameterized inputs, so that their functionality can be tailored to the recipe in which they are referenced.

output

Associated with a recipe, an output is a user-defined set of files or tables, formats, and locations where results are written after a job run on the recipe has completed.  

output destinations

An output may contain one or more destinations, each of which defines a file type, filename, and location where the results of the output are written.



recipe

A sequence of steps that transforms one or more datasets into a desired output. Recipes are built in the Transformer page using a sample of the dataset or datasets. When a job is executed, the steps of the recipe are applied in the listed order to the imported dataset or datasets to generate the output.

reference

A pointer to the output of a recipe. A reference can be used in other flows, so that those flows get the latest version of the output from the referenced recipe.

reference dataset

A reference that has been imported into another flow.

 

results

A set of generated files or tables containing the results of processing a selected recipe, its datasets, and all upstream dependencies.

results profile

Optionally, you can create a profile of your generated results. This profile is available through the Trifacta application and may assist in analyzing or troubleshooting issues with your dataset. See Overview of Visual Profiling.

sample

When you review and interact with your data in the data grid, you are seeing the current state of your recipe applied to a sample of the dataset. If the entire dataset is smaller than the defined limit, you are interacting with the entire dataset.

You can create new samples using one of several supported sampling techniques. See Overview of Sampling.

target

A set of columns, their order, and their formats to which you are attempting to wrangle your dataset. A target represents the schema to which you are attempting to wrangle. You can assign a target to your recipe, and the schema can be superimposed on the columns in the data grid, allowing you to make simple selections to transform your dataset to match the column names, order, and formats of the target. See Overview of RapidTarget.

Application Terms

These terms apply to the Trifacta application, a web-based application for interacting with your datasets, flows, and recipes.

Column Browser panel

Browse columns of your dataset, select and perform operations on one or more selected columns. See Column Browser Panel.

Column Details panel

Examine details and profile of the data in the selected column. See Column Details Panel.

Column menu

Perform transformation operations on the selected column from a list of menu options, including changing the column data type. See Column Menus.

Column Histogram

At the top of the column, review the counts of values in the column. Select one or more values in the column through the histogram. See Column Histograms.


Data Grid

In the Transformer page, the data grid displays a sample of the dataset at the currently selected step in the recipe. Make selections in the dataset to prompt suggestions for transformations to add to your recipe. See Data Grid Panel.

Data Quality bars

Review color-coded counts of valid, missing, and mismatched values in your column based on the column's data type. Select a color bar to be prompted with suggestions for transformations on the relevant rows. See Data Quality Bars.

Data Type menu

Change the data type for the column from the icon to the left of the column header. See Column Menus.

Dataset Details page

Examine details about your dataset, including source of data and other information. See Dataset Details Page.

Flows page

Create, manage, and export your flows. See Flows Page.

Flow View page

Build your flow objects, including recipes, outputs, and references. See Flow View Page.

Home page

Landing page after login. See Home Page.

Import Data page

Import data from a valid connection as an imported dataset. See Import Data Page.

Library page

Manage your imported datasets and reference objects. See Library Page.

Jobs page

Review the list of jobs that you have launched. View status, explore job details, and export results. See Jobs Page.

Job Details page

Review the details of your job, including an optional profile of the resulting data. See Job Details Page.

RapidTarget

Feature that enables matching of columns and data types of your dataset with a pre-defined target schema. 

Recipe panel

Add, edit, and remove steps from your current recipe. Apply changes and see updates immediately in the data grid sample.

Run Job page

Configure job, visual profiling, and job outputs before launching. See Run Job Page.

Samples panel

Review, create, and delete samples for the current recipe.

Search panel

Search for transformations to build as the next step in your recipe. See Search Panel.

Settings page

Review and modify settings. See Settings Page.

Standardize Page

Standardize similar column values using multiple matching techniques in a simple interface. See Standardize Page.

Selection Details panel

Based on selections you make in the data grid, you can review profiling information and a set of suggested transformations to add to your recipe. See Selection Details Panel.

Transformer toolbar

Select from common transformations in a toolbar across the top of the data grid. See Transformer Toolbar.

Transform Builder

Review and customize transformation steps. See Transform Builder.

Transformer page

Review sampled data, explore suggestions and previews, and build transformation steps. See Transformer Page.

User Profile page

Review and modify settings applicable to your user account. See User Profile Page.

Visible Columns panel

Review and toggle the visibility of the columns in your dataset. See Visible Columns Panel.

Recipe Development Terms

These terms pertain to building recipes in Wrangle in the Transformer page.

argument

An input to a function. See Wrangle Language.

binning

Several functions can be used to group values in a column into bins, which can assist in preparing your data for downstream use. See Prepare Data for Machine Processing.

data type

A data type is the set of constraints on expected values in a column. When you specify the data type for a column, you provide a means for the platform to identify the values in the column that do not match the selected type, which assists in wrangling the mismatched values. See Supported Data Types.

Data types can be selected from the column menus. See Column Menus.

dependency

An input to a recipe that is not the primary datasource for the recipe. For example, if your recipe includes a join step, the dataset that is joined into your recipe is an upstream dependency. Recipe steps and changes outside of the Trifacta application can create dependency errors, in which an upstream object can no longer be found and the reference to it cannot be resolved. These issues must be fixed prior to successful execution of a job. For more information, see Fix Dependency Issues

file encoding

A file's encoding defines the set of characters that are in use in the file. There are many different encoding systems in use around the world. To represent English language, which uses a 26-character alphabet, UTF-8 is sufficient. However, to represent Asian character sets, which may contain thousands of characters, a different and broader set of characters is required. See Supported File Encoding Types.

When a file is imported, Trifacta Wrangler assumes that the file is in the default encoding type. As needed, you can change the encoding type that is used to import the file. See Change File Encoding.

function

A function in Wrangle is an action that is applied to a set of values as part of a transformation step. A function can take 0 or more parameters as inputs, yielding a single output of a specific data type. For a list of supported functions, see Language Index

initial structure

When a file-based dataset is imported, Trifacta Wrangler attempts to detect the format and structure of the data and then to apply a set of initial parsing steps to transform the data for display in tabular form in the data grid. These steps may vary depending on the file format. See Initial Parsing Steps.

These steps do not appear in the recipe. As needed, you can disable the detection of structure on import. When disabled, these steps are added as the first steps of the recipe, where you can edit or remove them as needed. See Remove Initial Structure.

join

This database concept can be applied to datasets. In a join, two datasets are merged into one, based on a set of key columns. Values in these columns that match across the datasets are used to determined the values from each dataset to include in the joined dataset. See Join Types.

Joins are created as steps in your recipe. See Join Panel.

lookup

A retrieval of a row of values from another dataset based on common values in columns in each dataset. A lookup is useful for bringing in reference information based on values in one of the columns of your dataset. See Lookup Wizard.

mismatched

Values in a column that do not conform to range or format of expected values for the column's data type.  

missing

Cell values in the dataset that are empty.

multi-dataset operation

A multi-dataset (MDS) operation refers to any step in your recipe that uses two or more datasets. Joins and unions are examples of multi-dataset operations.

nested expression

An expression that is inside another expression. Example:

POWER(ABS(colA),colB)

Trifacta Wrangler supports the use of nested expressions in your recipe steps. See Wrangle Language.

null

A value that does not exist in the dataset. See Manage Null Values.

operator

A single character that represents an arithmetic function or comparison. For example, the Plus sign (+) represents the add function. 

Operator CategoryDescription
Logical Operatorsand, or, and not operators
Numeric OperatorsAdd, subtract, multiply, and divide
Comparison OperatorsCompare two values with greater than, equals, not equals, and less than operators
Ternary OperatorsUse ternary operators to create if/then/else logic in your transforms.

outliers

In statistics, an outlier refers to a value that is unusually above or below from the mean. In Trifacta Wrangler, an outlier is 4 standard deviations away from the mean. 

You can review outliers for column values. See Column Statistics Reference.

parameter (language)

An input to a transform in Wrangle. See Wrangle Language.

pattern

In Trifacta Wrangler, a pattern is an object that describes a sub-string within a value. Patterns can be described using regular expressions, a common standard, or Trifacta patterns, a proprietary simplification of regular expressions. See Text Matching.

Patterns are widely used in the product for identifying and extract values from data, data type validation, and supporting pattern-based suggestions.

  • See  Trifacta pattern.
  • See regular expression pattern.

regular expression pattern

Regular expressions are a powerful yet complex method of describing patterns of values for matching purposes. See Text Matching.

source row number

The row number for a record as it appeared in the original dataset. Source row number information can be obtained by function. This function may return a null value if multi-dataset operations, such as union and join, have been performed on the dataset. See SOURCEROWNUMBER Function

source metadata reference

A source metadata reference is a programmatic reference to some aspect of the source file for your dataset. Using these programmatic references, you can write source information for your original datasource into your dataset for future reference. For more information, see Source Metadata References

standardize

Trifacta Wrangler provides multiple mechanisms to standardize column values using patterns, clustering algorithms, or functions. See Overview of Standardization.


string collation

String collation refers to a method of comparison of strings based on a set of rules. Trifacta Wrangler includes the following functions to perform string collation-based comparisons:

transformation

A transformation is the unit of action in a recipe step. A transformation applies one or more actions on a set of rows or columns. Transformations are specified in the Transformer page through the Transform Builder. See Transform Builder.

For a list of available transformations, see Transformation Reference.

transform

A transform in Wrangle is an action that is applied to rows or columns of your dataset. A transform can take zero or more parameters as inputs. A parameter may contain a reference to a column, a literal value, or a function.

NOTE: Transforms are not available through the Trifacta application. Instead, you build transformations, which are more complex steps that reference transforms from the underlying language.

For a list of supported transforms, see Language Index.

Trifacta pattern

A simplification of regular expressions, Trifacta patterns are custom selectors for patterns in your data and provide a simpler and more readable alternative to regular expressions. See Text Matching.

union

A union combines two or more datasets such that the rows of the second and later datasets are appended to the end of the first dataset. In a union operation, the columns must be matched up, or the results are a ragged dataset. 

Unions are created as steps in your recipe. See Union Page.

wrangling

An informal term for the process of data preparation. Data wrangling was invented by the co-founders of  Trifacta



Miscellaneous Terms

Epoch/Unix time

Unix time (a.k.a. POSIX time or Epoch time) is a system for describing instants in time, defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, not counting leap seconds.

Your Rating: Results: 1 Star2 Star3 Star4 Star5 Star 6 rates

This page has no comments.