Page tree

 

Support | BlogContact Us | 844.332.2821

 

Contents:

This documentation applies to Trifacta Wrangler. Download this free product.
Registered users of this product or Trifacta Wrangler Enterprise should login to Product Docs through the application.

Datasets

Within the Trifacta® Application, your fundamental area of work is the dataset. There are two types of datasets:

TypeDescriptionEditable?Executable?
Imported

An imported dataset is a reference to a source of data. This source can be a file, multiple files, database table, or other type of data.

NOTE: An imported dataset is a pointer to an external source of data. It cannot be modified within the Trifacta Application.

NN
Wrangled

A wrangled dataset is an editable object for which you build your recipes to transform the source data. It contains:

  • A reference to another dataset (imported or wrangled).
  • A recipe of sequential steps that transform your data into the desired output
  • Any number of recipe executions that result in generated results on success or screen information on failure
YY

For additional information on the distinctions between dataset types, see Imported vs Wrangled Dataset.

The following diagram illustrates the component objects of a dataset and how they are created during dataset development in the application:

Figure: Objects in a Dataset

Data that is imported to the platform is referenced in an imported dataset. This source is simply a reference to the original data; it is not modified or stored within the platform.

  • An imported dataset can be used in multiple wrangled datasets.
  • Imported datasets are created through the Import Dataset Page.
  • For more information on the process, see Import Basics.

Create a wrangled dataset: To begin wrangling data, you must create a wrangled dataset from your imported dataset. You can do this as part of the import process or later, whenever an imported or wrangled dataset is added to a flow.

  • To create a wrangled dataset, you must add it to an existing flow or create a new flow. A flow is a container for holding imported and wrangled datasets.
  • For more information, see Flows below.

Open in Transformer page: When the wrangled dataset is first opened in the Transformer page, the following dataset-related objects become available:

  • recipe identifies the sequential set of steps that you define to cleanse and transform your data.
    • When the recipe is created, it may contain a set of steps that perform initial parsing of the data into rows and columns. These steps may vary depending on the type of source data. See Initial Parsing Steps.
    • Recipes are interpreted by Trifacta Wrangler and turned into commands that can be executed against the wrangled dataset.
    • Recipes are created using the various visual tools in the Transformer Page.
    • For more information on the process, see Transform Basics.
    • The Transformer page provides multiple interfaces for quickly selecting and building recipe steps. Your selections are converted into steps written in Wrangle (a domain-specific language for data transformation). For details on the syntax of this language, see Wrangle Language.

Generate Results: When you are satisfied with the recipe that you have created in the Transformer page, you can generate results. Results may be composed of one or both of the following types:

  • Transform: Executes the set of recipe steps that you have defined against your dataset, generating the transformed set of results for export.
  • Profile: Optionally, you can choose to generate a visual profile of the results. This visual profile can provide important feedback on data quality and can be a key for further refinement of your recipe.
  • See Generate Results Dialog.
  • When the results are generated, you can review them and identify data that still needs fixing. See Results Summary Page.
  • For more information on the process, see Generating Results Basics.

Flows

flow is a container for holding one or more datasets and their associated objects. A wrangled dataset must be contained in a flow.

The following diagram illustrates the flexibility of object relationships within a flow. In this example, the first four datasets feed into Dataset 3, from which are generated the final outputs.

Figure: Flow Example

DatasetsDescription
W-Dataset 1

Results of the job are used to create a new imported dataset (I-Dataset 2).

W-Dataset 2I-Dataset 2 is added directly to W-Dataset 2 (wrangled dataset 2) through the Transformer Page. See Transformer Page.
W-Dataset 3

W-Dataset 2 is included in W-Dataset 3. This step could be a join, union, or similar data blending statement in Recipe 3.

W-Dataset 4Although stored in the same flow, W-Dataset 4 is independent of the other datasets.

Flows are created in the Flows page. See Flows Page.

Your Rating: Results: PatheticBadOKGoodOutstanding! 4 rates

This page has no comments.