Glossary

Terminology applicable to Designer Cloud Powered by Trifacta Enterprise Edition.

Note

This list is not comprehensive.

Object Terms

These terms apply to the objects that you import, create, and generate in Designer Cloud Powered by Trifacta Enterprise Edition.

author

In an application role, the author privilege allows the highest level of access, except for ownership, to application objects. This privilege can be applied to object types within an assignable role. See Overview of Authorization.

collaborator

Anyone who has been provided editor- or author-level access to an object. See Overview of Sharing.

connection

A configuration object that defines the integration between the product and a datastore, through which data is read from and optionally written to the store. A connection can be read-only or read-write, depending on the type. Some connections are provided by default.

Other connections are created through the Trifacta Application.

See Using Connections.
See Connections Page.

See Application Asset Overview.

connector

A configuration object and related driver that creates the actual connection between the product and a type of datastore. A connector provides the basic definition for connections of a specific vendor type.

dataset with parameters

An imported dataset that has been created with parameterized references, typically used to collect multiple assets stored in similar locations or filenames containing identical structures. For example, if you stored orders in individual files for each week in a single directory, you could create a dataset with parameters to capture all of those files in a single object, even if more files are added at a later time.

The path to the asset or assets is specified with one or more of the following types of parameters: Datetime, Wrangle , regular expression, wildcard, or variable.

See Overview of Parameterization.
See Create Dataset with Parameters.

dataset with custom SQL

A dataset that is created by apply a custom SELECT statement to a relational datasource. You can create custom SQL statements to change the scope of your imported dataset from a single, entire table.

See Create Dataset with SQL.

data type

A data type refers to the expected class of values for a column of data. A data type defines the types of information that are expected and can include specific formatting of that information. Column values that do not meet the expectations of the column data type are determined to be invalid for the data type.

deployment

A mechanism for versioning publication of your flows. In Deployment Manager, packages are imported as new releases assigned to a deployment. Within a deployment, you can choose which release is active, allowing you to version the publication of your flows. See Overview of Deployment Manager.

editor

In an application role, the editor privilege allows viewing and modifying application objects. This privilege can be applied to object types within an assignable role. See Overview of Authorization.

flow

A container for holding a set of related imported datasets, recipes, and output objects. Flows are managed in Flow View page.

See Application Asset Overview.
See Flow View Page.

flow parameter

A named reference that you can apply in your recipe steps. When applied, the flow parameter is replaced with its corresponding value, which may be the default value or an override value. See Overview of Parameterization.

imported dataset

A reference to an object that contains data to be wrangled in Designer Cloud Powered by Trifacta Enterprise Edition. An imported dataset is created when you specify the file(s) or table(s) that you wish to read through a connection.

See Application Asset Overview.
See Import Data Page.

job

A job is the sequence of processing steps that apply each step of your recipe in sequence across the entire dataset to generate the desired set of results.

See Application Asset Overview.
See Run Job Page.

macro

A macro is a sequence of one or more reusable recipe steps. Macros can be configured to accept parameterized inputs, so that their functionality can be tailored to the recipe in which they are referenced.

See Application Asset Overview.
See Overview of Macros.

output

Associated with a recipe, an output is a user-defined set of files or tables, formats, and locations where results are written after a job run on the recipe has completed.

See Application Asset Overview.
See Flow View Page.

output destinations

An output may contain one or more destinations, each of which defines a file type, filename, and location where the results of the output are written.

See Run Job Page.
See Flow View Page.

output parameter

You can create variable or timestamp parameters that can be applied to parts of the file or table paths of your outputs. Variable values can be specified at the time of job execution.

See Overview of Parameterization.
See Run Job Page.

package

A flow imported into a Production instance of the platform. A package contains a JSON-based definition of the flow in a ZIP file. On import, rules must be created to modify any mappings to connections or paths to datasets that may have changed from those of the platform instance from where the package was exported. See Overview of Deployment Manager.

plan

A plan is a sequence of triggers and tasks that can be applied across multiple flows. For example, you can schedule plans to execute sequences of flows at a specified frequency. For more information, see Overview of Operationalization.

privilege

A privilege determines the level of access to a type of Alteryx object. Privileges are assigned using roles. For more information, see Overview of Authorization.

publication

The delivery of a set of results generated by Designer Cloud Powered by Trifacta Enterprise Edition to another system. See Publishing Dialog.

parameter override

A value that is applied instead of the default or inherited value for a parameter. A parameter override may be applied at the flow level or at the time of job execution.

See Manage Parameters Dialog.
See Run Job Page.

recipe

A sequence of steps that transforms one or more datasets into a desired output. Recipes are built in the Transformer page using a sample of the dataset or datasets. When a job is executed, the steps of the recipe are applied in the listed order to the imported dataset or datasets to generate the output.

See Application Asset Overview.
See Recipe Panel.

reference

A pointer to the output of a recipe. A reference can be used in other flows, so that those flows get the latest version of the output from the referenced recipe.

See Application Asset Overview.
See Flow View Page.

reference dataset

A reference that has been imported into another flow.

See Application Asset Overview.
See Flow View Page.

release

A specific instance of a release that has been imported into the Deployment Manager.SeeOverview of Deployment Manager.

results

A set of generated files or tables containing the results of processing a selected recipe, its datasets, and all upstream dependencies. See Job Details Page.

results profile

Optionally, you can create a profile of your generated results. This profile is available through the Trifacta Application and may assist in analyzing or troubleshooting issues with your dataset. See Overview of Visual Profiling.

role

A role is a set of privileges that governs access levels to one or more types of objects. For more information, see Overview of Authorization.

sample

When you review and interact with your data in the data grid, you are seeing the current state of your recipe applied to a sample of the dataset. If the entire dataset is smaller than the defined limit, you are interacting with the entire dataset.

You can create new samples using one of several supported sampling techniques. See Overview of Sampling.

schedule

You can associate a single schedule with a flow. A schedule is a combination of one or more trigger times and the one or more scheduled destinations that are generated when the trigger is hit. A schedule must have at least one trigger and at least one scheduled destination in order to work.

See also trigger and scheduled destination.
See Overview of Scheduling.

scheduled destination

When a schedule's trigger is fired, each recipe that has a scheduled destination associated with it is queued for execution. When the job completes, the outputs specified in the scheduled destination are generated. A recipe may have only one scheduled destination, and a scheduled destination may have multiple outputs (publishing actions) associated with it.

See also schedule and trigger.
See Overview of Scheduling.

schema

A schema defines the column names, data types, and ordering of your dataset. Schemas apply to relational datasources and some file types, such as Avro or Parquet. For more information, see Overview of Schema Management.

snapshot

When a plan is triggered, a snapshot of all tasks in the plan is taken. The tasks of the plan are executed against this snapshot. Subsequent revisions to these objects may impact the execution of the plan. For more information, see Overview of Operationalization.

target

A set of columns, their order, and their formats to which you are attempting to wrangle your dataset. A target represents the schema to which you are attempting to wrangle. You can assign a target to your recipe, and the schema can be superimposed on the columns in the data grid, allowing you to make simple selections to transform your dataset to match the column names, order, and formats of the target. See Overview of Target Schema Mapping.

task

A task is an executable action that is part of a plan. For example, when a plan is triggered, the first task in the plan is queued for execution, which may be to execute all of the recipes and their dependencies in a flow. For more information, see Overview of Operationalization.

trigger

A trigger is a periodic time associated with a schedule. When a trigger's time occurs, all of flows associated with the trigger are queued for execution.

A schedule can have multiple triggers. See also schedule and scheduled destination.
For more information on flow-based triggers, see Overview of Scheduling.

For more information on plan-based triggers, see Overview of Operationalization.

variable (dataset)

A replacement for the parts of a file path to data that change with each refresh. A variable can be overwritten as needed at job runtime.

See Overview of Parameterization.
See Create Dataset with Parameters.

viewer

In an application role, the viewer privilege allows read-only access application objects. This privilege can be applied to object types within an assignable role. See Overview of Authorization.

Application Terms

These terms apply to the Trifacta Application, a web-based application for interacting with your datasets, flows, and recipes.

Add Schedule dialog

Create or modify scheduled executions of your flow.

See Overview of Scheduling.
See Add Schedule Dialog.

Canvas

The main panel of Flow View where you can add, arrange, and remove flow objects. See Flow View Page.Plan View also contains a canvas area. See Plan View Page.

Cluster Clean

Standardize values in a column by clustering similar values together. See Overview of Cluster Clean.

Column By Example

Creates a new column of data by providing example values from an existing column. See Overview of TBE.

Column Browser panel

Review sampled data across multiple columns through the column browser. You can also use the Column Browser panel to toggle the display of individual columns. See Column Browser Panel.

Column Details panel

Examine details and profile of the data of a selected column. See Column Details Panel.

Column histogram

At the top of the column, review the counts of values in the column. Select one or more values in the column through the histogram. See Column Histograms.

Connections page

Create or edit connections to external storage. See Connections Page.

Data Grid

In the Transformer page, the data grid displays a sample of the dataset at the currently selected step in the recipe. Make selections in the dataset to prompt suggestions for transformations to add to your recipe. See Data Grid Panel.

Data Quality bars

Review color-coded counts of valid, missing, and mismatched values in your column based on the column's data type. Select a color bar to be prompted with suggestions for transformations on the relevant rows. See Data Quality Bars.

Deployment Manager

Deploy Production versions of your flows through the Deployment Manager. See Overview of Deployment Manager.

Dataset Details page

Examine details about your dataset, including source of data and other information. See Dataset Details Page.

email notifications

By default, the Trifacta Application sends notifications to users on the success or failure of their jobs and plans. The delivery of these notifications can be disabled as needed. See Email Notifications Page.

Flag for Review

The Flag for review feature enables flow users to flag recipe steps for others to review, provide inputs, and sign off on the changes before jobs are permitted to execute. See Flag for Review.

Flows page

Create, manage, and export your flows. See Flows Page.

Flow View page

Build your flow objects, including recipes, outputs, and references. See Flow View Page.

Home page

Landing page after login. See Home Page.

Import Data page

Import data from a valid connection as an imported dataset. See Import Data Page.

Library page

Manage your imported datasets and reference objects. See Library Page.

Job Details page

Review the details of your job, including an optional profile of the resulting data. See Job Details Page.

Job History page

Review the list of jobs that you have launched. View status, explore job details, and export results. See Job History Page.

Job monitoring

Job monitoring enables users of the Trifacta Application to monitor job progress through each phase of its execution. See Overview of Job Monitoring.

Plan View

Plan View page enables you to build, arrange, and execute a sequence of one or more tasks in an orchestrated plan. Plans can be scheduled for execution or run on an ad-hoc basis. See Plan View Page.

Publishing dialog

Publish results to an external system. See Publishing Dialog.

Recipe panel

Add, edit, and remove steps from your current recipe. Apply changes and see updates immediately in the data grid sample.

See Transform Basics.
See Recipe Panel.

Run Job page

Configure job, visual profiling, and job outputs before launching. See Run Job Page.

Samples panel

Review, create, and delete samples for the current recipe.

See Overview of Sampling.
See Samples Panel.

Sample Jobs page

Review status of all samples that you have initiated. Administrators can access the samples of all users. For more information, see Sample Jobs Page.

Scheduling

Feature that enables automated execution of flows according to user-defined schedules. See Overview of Scheduling.

Search panel

Search for transformations to build as the next step in your recipe. See Search Panel.

Settings page

Review and modify settings. See Preferences Page.

Selection Details panel

Based on selections you make in the data grid, you can review profiling information and a set of suggested transformations to add to your recipe. See Selection Details Panel.

Support bundle

Contains log files for each phase of the job execution. SeeSupport Bundle Contents.

Target schema mapping

Feature that enables matching of columns and data types of your dataset with a pre-defined target schema.

See Overview of Target Schema Mapping.
See Flow View Page.

Transform Builder

Review and customize transformation steps. See Transform Builder.

Transformer page

Review sampled data, explore suggestions and previews, and build transformation steps. See Transformer Page.

User Profile page

Review and modify settings applicable to your user account. See User Profile Page.

Visible Columns panel

Review and toggle the visibility of the columns in your dataset. See Visible Columns Panel.

Concepts

base storage layer

The primary storage layer of the Designer Cloud Powered by Trifacta Enterprise Edition platform. During initial installation, you define the base storage layer for the platform, where uploads, samples, and temp files are stored. See Set Base Storage Layer.

orchestration

Orchestration refers the the sequencing, execution, and monitoring of a series of tasks in the platform. In the Designer Cloud Powered by Trifacta platform, orchestration is defined using plans. See Overview of Operationalization.

sample checkpointing

As you build more complex recipes and flows, it's a good idea to create samples periodically in your recipe steps. All steps between the currently displayed sample and the currently displayed recipe step are executed in the browser, so this type of checkpointing with samples can improve performance. For more information on best practices in sampling, see Overview of Sampling.

schema drift

Feature that detects changes to the schema of your imported datasets before or during job execution. See Overview of Schema Management.

schema refresh

Feature that refreshes the schema of your imported datasets within the Designer Cloud Powered by Trifacta Enterprise Edition based on changes to the schemas of their datasources. See Overview of Schema Management.

type system

The system within the Designer Cloud Powered by Trifacta platform for managing data types.

The Designer Cloud Powered by Trifacta platform can read data types from a variety of source systems. These types are then mapped to internal Alteryx data types.
During recipe development, data types may be re-inferred by the Designer Cloud Powered by Trifacta platform, as the data within your columns changes.
During job execution and publishing to a target system, Alteryx data types may be mapped to a different set of data types, depending on the target.

Recipe Development Terms

These terms pertain to building recipes in Wrangle in the Transformer page.

argument

An input to a function. See Wrangle Language.

binning

Several functions can be used to group values in a column into bins, which can assist in preparing your data for downstream use. See Prepare Data for Machine Processing.

data type

A data type is the set of constraints on expected values in a column. When you specify the data type for a column, you provide a means for the platform to identify the values in the column that do not match the selected type, which assists in wrangling the mismatched values. See Supported Data Types.

Data types can be selected from the column menus. See Column Menus.

dependency

An input to a recipe that is not the primary datasource for the recipe. For example, if your recipe includes a join step, the dataset that is joined into your recipe is an upstream dependency. Recipe steps and changes outside of the Trifacta Application can create dependency errors, in which an upstream object can no longer be found and the reference to it cannot be resolved. These issues must be fixed prior to successful execution of a job. For more information, see Fix Dependency Issues.

file encoding

A file's encoding defines the set of characters that are in use in the file. There are many different encoding systems in use around the world. To represent English language, which uses a 26-character alphabet, UTF-8 is sufficient. However, to represent Asian character sets, which may contain thousands of characters, a different and broader set of characters is required. See Supported File Encoding Types.

When a file is imported, Designer Cloud Powered by Trifacta Enterprise Edition assumes that the file is in the default encoding type. As needed, you can change the encoding type that is used to import the file. See Change File Encoding.

full scan

A full scan sample is generated across the entire dataset on default running environment. Full scan samples are more representative of the total dataset. However, they can take a while to generate. For more information, see Samples Panel.

function

A function in Wrangle is an action that is applied to a set of values as part of a transformation step. A function can take 0 or more parameters as inputs, yielding a single output of a specific data type. For a list of supported functions, see Language Index.

initial structure

When a file-based dataset is imported, Designer Cloud Powered by Trifacta Enterprise Edition attempts to detect the format and structure of the data and then to apply a set of initial parsing steps to transform the data for display in tabular form in the data grid. These steps may vary depending on the file format. See Initial Parsing Steps.

These steps do not appear in the recipe. As needed, you can disable the detection of structure on import. When disabled, these steps are added as the first steps of the recipe, where you can edit or remove them as needed. See Remove Initial Structure.

join

This database concept can be applied to datasets. In a join, two datasets are merged into one, based on a set of key columns. Values in these columns that match across the datasets are used to determined the values from each dataset to include in the joined dataset. See Join Types.

Joins are created as steps in your recipe. See Join Window.

lookup

A retrieval of a row of values from another dataset based on common values in columns in each dataset. A lookup is useful for bringing in reference information based on values in one of the columns of your dataset. See Lookup Wizard.

mismatched

Values in a column that do not conform to range or format of expected values for the column's data type.

missing

Cell values in the dataset that are empty.

multi-dataset operation

A multi-dataset (MDS) operation refers to any step in your recipe that uses two or more datasets. Joins and unions are examples of multi-dataset operations.

nested expression

An expression that is inside another expression. Example:

POWER(ABS(colA),colB)

Designer Cloud Powered by Trifacta Enterprise Edition supports the use of nested expressions in your recipe steps. See Wrangle Language.

null

A value that does not exist in the dataset. See Manage Null Values.

operator

A single character that represents an arithmetic function or comparison. For example, the Plus sign (+) represents the add function.

Operator Category	Description
Logical Operators	and, or, and not operators
Numeric Operators	Add, subtract, multiply, and divide
Comparison Operators	Compare two values with greater than, equals, not equals, and less than operators
Ternary Operators	Use ternary operators to create if/then/else logic in your transforms.

outliers

In statistics, an outlier refers to a value that is unusually above or below from the mean. In Designer Cloud Powered by Trifacta Enterprise Edition, an outlier is 4 standard deviations away from the mean.

You can review outliers for column values. See Column Statistics Reference.

parameter (language)

An input to a transform in Wrangle. See Wrangle Language.

pattern

In Designer Cloud Powered by Trifacta Enterprise Edition, a pattern is an object that describes a sub-string within a value. Patterns can be described using regular expressions, a common standard, or Wrangle , a proprietary simplification of regular expressions. See Text Matching.

Patterns are widely used in the product for identifying and extract values from data, data type validation, and supporting pattern-based suggestions.

See .
See regular expression.

plan metadata reference

A plan metadata reference is a programmatic reference to some aspect of a plan, its tasks, or results of the execution. These metadata references can be inserted into the requests and responses of tasks in the plan for delivery to other systems. For more information, see Plan Metadata References.

quick scan

A quick scan sample is generated using an appropriate selection of rows from the dataset. Since these samples are generated in Trifacta Photon, they are faster to produce. For more information, see Samples Panel.

range join

A range join is a type of join in which key values may be matched with a range of values in the joined-in dataset. For example, you can create a range join based on the source key value being greater than values in the key column of the joined-in dataset. A range join can explode the size of your resulting dataset. For more information, see Configure Range Join.

Joins are created as steps in your recipe. See Join Window.

regular expression

Regular expressions are a powerful yet complex method of describing patterns of values for matching purposes. See Text Matching.

source row number

The row number for a record as it appeared in the original dataset. Source row number information can be obtained by function. This function may return a null value if multi-dataset operations, such as union and join, have been performed on the dataset. See SOURCEROWNUMBER Function.

source metadata reference

A source metadata reference is a programmatic reference to some aspect of the source file for your dataset. Using these programmatic references, you can write source information for your original datasource into your dataset for future reference. For more information, see Source Metadata References.

standardize

Designer Cloud Powered by Trifacta Enterprise Edition provides multiple mechanisms to standardize column values using patterns, clustering algorithms, or functions. See Overview of Standardization.

string collation

String collation refers to a method of comparison of strings based on a set of rules. Designer Cloud Powered by Trifacta Enterprise Edition includes the following functions to perform string collation-based comparisons:

transformation

A transformation is the unit of action in a recipe step. A transformation applies one or more actions on a set of rows or columns. Transformations are specified in the Transformer page through the Transform Builder. See Transform Builder.

For a list of available transformations, see Transformation Reference.

transform

A transform in Wrangle is an action that is applied to rows or columns of your dataset. A transform can take zero or more parameters as inputs. A parameter may contain a reference to a column, a literal value, or a function.

Note

Transforms are not available through the Trifacta Application. Instead, you build transformations, which are more complex steps that reference transforms from the underlying language.

For a list of supported transforms, see Language Index.

Alteryx pattern

A simplification of regular expressions, Wrangle are custom selectors for patterns in your data and provide a simpler and more readable alternative to regular expressions. See Text Matching.

union

A union combines two or more datasets such that the rows of the second and later datasets are appended to the end of the first dataset. In a union operation, the columns must be matched up, or the results are a ragged dataset.

Unions are created as steps in your recipe. See Union Page.

wrangling

An informal term for the process of data preparation. Data wrangling was invented by the co-founders of Alteryx.

Connectivity

Connect string options

When you create a connection to a datastore, you may be able to specify a set of one or more options, which are appended to the connection string passed to the datastore for access. Connect string options are specified as part of the definition of each connection object within the Trifacta Application. See Create Connection Window.

Long loading

Long loading refers to the process by which large datasets can be asynchronously loaded into Trifacta Application. From relational sources or sources that require conversion, larger datasets can be queued for loading and conversion for use. While these datasets are being loaded, you can continue to use the Trifacta Application for other tasks.

Admin Terms

These terms apply to administration of your project or workspace and the underlying platform.

Admin Settings page

A page in the Trifacta Application where administrators can configure platform users, settings, and other configuration options. See Admin Settings Page.

Deployment Manager page

A page in the Trifacta Application where provisioned users can manage their deployments for a Production instance. Users must have the Deployment role in their account, or the entire instance must be configured as a Production instance. See Deployment Manager Page.

workspace

A logical structure for organizing the management of a set of users and their data.

Workspace Settings page

A page in the Trifacta Application for managing a workspace's features and other configuration options. See Workspace Settings Page.

Platform Terms

These terms apply to the underlying Designer Cloud Powered by Trifacta platform.

access token

Individual users can be provisioned an access token to enable interaction with the REST API endpoints. Access tokens must be submitted with each request to the Designer Cloud Powered by Trifacta platform. For more information, see Manage API Access Tokens.

Artifact Storage service

A platform service for managing the storage of user-specified data, such as value mappings.

Authorization service

A platform service for managing access levels for Trifacta Application objects such as flows, connections, and plans.

Avro

A data serialization format for Hadoop. For more information, see Supported File Formats.

API

Short for Application Protocol Interface, the platform APIs permit programmatic access to developers to platform actions from outside of the application interface. For more information, see API Reference.

Batch job runner

A platform service for queued and managing the execution of jobs through external running environments. For more information, see Configure Batch Job Runner.

BZIP

A data serialization format for Hadoop. For more information, seeSupported File Formats.

Chrome

The Trifacta Application can be served through a supported version of Google Chrome. For more information, see Browser Requirements.

Configuration service

A platform service for managing system, edition, and workspace/project levels of configuration.

Connector Configuration service

A platform service for managing the configuration of platform-level connectors, their defaults, and their overrides. See Configure Connector Configuration Service.

Conversion service

A platform service for converting binary, relational or interpreted datasources into formats that are natively understood by the Designer Cloud Powered by Trifacta platform.

cron

Time-based job scheduling format. The Designer Cloud Powered by Trifacta platform supports a modified form of cron. For more information, see cron Schedule Syntax Reference.

Data service

A platform service for managing connections and interactions with relational storage. For more information, see Configure Data Service.

Firefox

The Trifacta Application can be served through a supported version of Mozilla Firefox. For more information, see Browser Requirements.

GZIP

A file format for compression and decompression. For more information, seeSupported File Formats.

Hyper

A native format for the Tableau data visualization platform. The Designer Cloud Powered by Trifacta platform can generate results in Hyper format. For more information, see Supported File Formats.

ingestion

The process by which relational datasources can be retrieved from their origin and transferred to the backend datastore of the platform, which improves performance in sampling and job execution. For more information, see Configure JDBC Ingestion.

Java UDF service

A platform service for managing the deployment and execution of user-defined functions (UDFs) authored in Java. See Java UDFs.

Java VFS service

A Java-based platform service for managing the connectivity with file-based storage systems through a virtual file system (VFS). See Configure Java VFS Service.

Job Metadata service

A platform service for storing metadata related to job execution.

JSON

Javascript Object Notation (JSON) is a human-readable format for transmitting data objects. For more information, seeSupported File Formats.

Microsoft Excel

Microsoft Excel workbooks and worksheets can be used as imported datasets in the platform. For more information, see Import Excel Data.

ML service

A platform service for processing user activities for improving platform recommendations.

MySQL

An open-source relational database management system. MySQL can host the Alteryx databases. For more information, see System Requirements.

machine learning

The process by which computer systems use data as inputs for algorithms and statistical models to make decisions and perform tasks.

operationalization

The process by which actions in the platform can be applied and scheduled in production environments.

Optimizer service

A platform service for managing optimizations of flow execution within supported relational datastores through SQL query. See Configure Optimizer Service.

Orchestration service

A platform service for managing the execution of plans. See Configure Orchestration Service.

Trifacta Photon

An in-memory running environment for running jobs. Embedded in the Trifacta node, Trifacta Photon is fast and best-suited for small- to medium-sized jobs.

Trifacta Photon client

An in-browser client for managing the sampling and transformation of data on the web client. For more information, see Configure Photon Client.

PostgreSQL

An open-source relational database management system. PostgreSQL can host theAlteryx databases . For more information, seeSystem Requirements.

predictive transformation

Specific to the Designer Cloud Powered by Trifacta platform, predictive transformation serves as the foundation of design principles for how users interact with their data. For more information, see Overview of Predictive Transformation.

profile job

When a job is executed against a dataset, users can optionally choose to generate a visual profile of the results, which is processed as a separate job after the transformation job has completed. For more information, see Run Job Page.

running environment

One of several environments where transformation, profiling, and sampling jobs can be executed. The platform integrates with these environments and manages the queuing and monitoring of the jobs asynchronously, minimizing performance impacts on the Trifacta node. For more information, see Running Environment Options.

Scheduling service

A platform service for managing the execution of schedules based on defined triggers. See Configure Scheduling.

Secure Token service

A platform service for managing the use of secure tokens for access to third-party systems. See Configure Secure Token Service.

Spark Job service

A platform service for managing job execution on Spark-based systems. See Configure for Spark.

SSO

Short for Single-Sign On, SSO enables users to access multiple systems within the enterprise domain through one set of credentials. The Designer Cloud Powered by Trifacta platform can integrate with multiple types of SSO.

Snappy

A fast compression and decompression format. For more information, seeSupported File Formats.

Time-based trigger service

A platform service for monitoring triggers for executing scheduled jobs. See Configure Scheduling.

transform job

The process by which a recipe is applied across the entire dataset to generate results at the specified output locations. For more information, see Run Job Page.

trifacta-conf.json

The primary configuration file of the Designer Cloud Powered by Trifacta platform. This file is stored in JSON format on the Trifacta node.

Note

Administrators should perform platform configuration operations through the Admin Settings page, where possible. See Admin Settings Page.

For more information, see Platform Configuration Methods.

UDF

Short for user-defined function, a UDF is an externally developed function that can be used in your recipes to apply custom transformation logic. Building UDFs requires developer skills. For more information, see User-Defined Functions.

VFS service

A JavaScript-based platform service for managing the connectivity with file-based storage systems through a virtual file system (VFS). See Configure VFS Service.

This service has been superseded by the Java VFS Service.

visual profiler

A platform service that can be optionally invoked to generate visual profiles on generated results for display in the Trifacta Application. For more information, see Overview of Visual Profiling.

Webapp service

A platform service for loading data through connections into the Trifacta Application for user interaction.

webhook

A Webhook is a message sent over HTTP via REST API request from one application to another. In the Designer Cloud Powered by Trifacta platform, you can configure Webhooks to be sent to a third-party application based on the success or failure of a job execution. For more information, see Create Flow Webhook Task.

Hadoop Terms

Here are a few terms that are specific to Hadoop and Hadoop-based clusters.

Cloudera

A Hadoop-based platform storing large volumes of data and performing analytics on them. For more information, see Supported Deployment Scenarios for Cloudera.

cluster

With respect to the platform, a cluster is a remote collection of nodes for processing platform jobs and returning results. The platform supports integration with multiple types of clusters for job processing.

Hadoop

An open-source framework of utilities for managing analytics and data processing jobs across a network of many nodes in a cluster. Hadoop is scalable and extensible and well-suited for processing very large data volumes.

HDFS

Short for Hadoop Distributed File System, HDFS is a backend datastore for Hadoop-based clusters. Files are stored in large blocks distributed across many nodes of the cluster. Applications and users can interact with the files through a virtual file browser. For more information, see Using HDFS.

high availability

High availability refers to a general concept of automated redundancy and failover to backup servers when a primary server is down. The platform can integrate with high availability functions of a Hadoop-based cluster. For more information, see Enable Integration with Cluster High Availability.

HttpFS

One of two supported communications protocols between the platform and HDFS, HttpFS utilizes HTTP protocol and is required in some deployments. For more information, see Enable HttpFS.

Kerberos

Kerberos provides secure protocols for authentication across a variety of platforms. For more information, see Configure for Kerberos Integration.

KMS

Short for Key Management System, KMS for Hadoop clusters is supported by the platform. For more information, see Configure for KMS.

Sentry

Authorization service for Hadoop clusters, from Apache product and supported by Cloudera. For more information, see Configure for KMS for Sentry.

WebHDFS

WebHDFS is the default protocol for communicating between the platform and HDFS. For more information, see Prepare Hadoop for Integration with the Platform.

YARN

Hadoop resource manager.

AWS Terms

These terms apply to Amazon Web Services, where the Designer Cloud Powered by Trifacta platform can be hosted.

AWS

Short for Amazon Web Services, AWS is a cloud-based platform for developing and deploying applications. For more information, see Configure for AWS.

AWS SSO

Single-Sign On service provided by AWS. For more information, see "SSO" in the Glossary.

EC2

Elastic Compute Cloud (Amazon EC2) is a web-based service for running applications in the Amazon Web Services (AWS) public cloud. The Designer Cloud Powered by Trifacta platform can be deployed from an EC2 instance.

EMR

Short for Elastic Map Reduce, EMR is a Hadoop-based platform purpose built to manage large datasets on AWS. See Configure for EMR.

Glue

Metadstore for Hive datasets, which can be used as a source of imported datasets. See AWS Glue Access.

IAM

An identify and access management (IAM) role defines a set of permissions for making AWS requests. Trusted entities assume roles, like IAM users, applications, or AWS services.

IAM role

An IAM role is an IAM entity that defines a set of permissions that define the scope of service requests. The Designer Cloud Powered by Trifacta platform can use IAM roles for enabling access to AWS-based resources controlled by the enterprise. For more information, see Configure for EC2 Role-Based Authentication.

Policy

A policy is a statement that maps a set of permissions to a set of one or more resources. A policy statement can be assigned to a role, which enables users who are assigned the role to access the resources based on the policy definition. These policies are evaluated when an IAM principal (role or user) makes a request for services.

RDS

Amazon Relational Database System (RDS) is a relational database management system available in the AWS cloud, The databases required by the Designer Cloud Powered by Trifacta platform can be installed on Amazon RDS. See Install Databases on Amazon RDS.

Redshift

A hosted data warehouse solution available through AWS. The Designer Cloud Powered by Trifacta platform can connect to Redshift databases. See Amazon Redshift Connections.

S3

Simple Storage Service (S3) is an online storage service provided by AWS. The Designer Cloud Powered by Trifacta platform can use S3 as the backend storage system or can integrate with it as secondary storage.

Secrets Manager

AWS Secrets Manager is a secure and convenient storage system for API keys, passwords, certificates, and other sensitive data. See Configure for AWS Secrets Manager.

Azure Terms

These terms apply to Microsoft Azure, where the Designer Cloud Powered by Trifacta platform can be hosted, and its available datastores and services.

ADLS

Azure Data Lake Store (ADLS) is a scalable big data repository.

Azure

Microsoft Azure is a cloud computing service for building, managing, and deploying applications. See Configure for Azure.

Azure Databricks

Spark-based analytics running environment built specifically for Microsoft Azure. See Configure for Azure Databricks.

WASB

Windows Azure Storage Blob (WASB) is an abstraction layer on top of HDFS for storage across multiple clusters.

Miscellaneous Terms

Epoch/Unix time

Unix time (a.k.a. POSIX time or Epoch time) is a system for describing instants in time, defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, not counting leap seconds.

In this section:

Glossary

Object Terms

author

collaborator

connection

connector

dataset with parameters

dataset with custom SQL

data type

deployment

editor

flow

flow parameter

imported dataset

job

macro

output

output destinations

output parameter

package

plan

privilege

publication

parameter override

recipe

reference

reference dataset

release

results

results profile

role

sample

schedule

scheduled destination

schema

snapshot

target

task

trigger

variable (dataset)

viewer

Application Terms

Add Schedule dialog

Canvas

Cluster Clean

Column By Example

Column Browser panel

Column Details panel

Column menu

Column histogram

Connections page

Data Grid

Data Quality bars

Data Type menu

Deployment Manager

Dataset Details page

email notifications

Flag for Review

Flows page

Flow View page

Home page

Import Data page

Library page

Job Details page

Job History page

Job monitoring

Plan View

Publishing dialog

Recipe panel

Run Job page

Samples panel

Sample Jobs page

Scheduling

Search panel

Settings page

Share Flow dialog

Selection Details panel

Support bundle

Target schema mapping

Transformer toolbar