Page tree

 

Support | BlogContact Us | 844.332.2821

 

Contents:

This documentation applies to Trifacta Wrangler. Download this free product.
Registered users of this product or Trifacta Wrangler Enterprise should login to Product Docs through the application.

Datasets

Within Trifacta® Wrangler, your fundamental area of work is the dataset. There are two types of datasets:

TypeDescriptionEditable?Executable?
Imported

An imported dataset is a reference to a source of data. This source can be a file, multiple files, database table, or other type of data.

NOTE: An imported dataset is a pointer to a source of data. It cannot be modified within Trifacta Wrangler.

NOTE: External sources of data are not supported in Trifacta Wrangler. All sources must be uploaded files.

NN
Wrangled

A wrangled dataset is an editable object for which you build your recipes to transform the source data. It contains:

  • A reference to another dataset (imported or wrangled).
  • A recipe of sequential steps that transform your data into the desired output
  • Any number of recipe executions that result in generated results on success or screen information on failure
YY

For additional information on the distinctions between dataset types, see Imported vs Wrangled Dataset.

The following diagram illustrates the component objects of a dataset and how they are created during dataset development in the application:

Figure: Objects in a Dataset

Data that is imported to the platform is referenced in an imported dataset. This source is simply a reference to the original data; it is not modified or stored within the platform.

  • An imported dataset can be used in multiple wrangled datasets.
  • Imported datasets are created through the Import Dataset Page.
  • For more information on the process, see Import Basics.

Add to flow: After you have created an imported dataset, it becomes usable after it has been added to a flow. A flow is a container for holding imported and wrangled datasets. For more information on flows, see below.

  • You can do this as part of the import process or later.

Create recipe and wrangled dataset: After an imported dataset has been added to a flow, you can create a wrangled dataset and a recipe for the wrangled dataset.

  • A wrangled dataset isa set of metadata about the imported dataset.
    • When you are building your recipes, you are applying them to the wrangled dataset, which leaves the source (imported dataset) untouched.
  • recipe identifies the sequential set of steps that you define to cleanse and transform your data.
    • When the recipe is created, it may contain a set of steps that perform initial parsing of the data into rows and columns. These steps may vary depending on the type of source data. See Initial Parsing Steps.
    • Recipes are interpreted by Trifacta Wrangler and turned into commands that can be executed against the dataset.
    • Recipes are created using the various visual tools in the Transformer Page.
    • For more information on the process, see Transform Basics.

Open in Transformer page: The Transformer page provides multiple methods for quickly selecting and building recipe steps. Your selections are converted into steps written in Wrangle (a domain-specific language for data transformation)

Generate Results: When you are satisfied with the recipe that you have created in the Transformer page, you can generate results. Results may be composed of one or both of the following types:

  • Transform: Executes the set of recipe steps that you have defined against your dataset, generating the transformed set of results for export.
  • Profile: Optionally, you can choose to generate a visual profile of the results. This visual profile can provide important feedback on data quality and can be a key for further refinement of your recipe.
  • See Generate Results Dialog.
  • When the results are generated, you can review them and identify data that still needs fixing. See Results Summary Page.
  • For more information on the process, see Generating Results Basics.

Flows

flow is a container for holding one or more datasets and their associated objects. A wrangled dataset must be contained in a flow.

The following diagram illustrates the flexibility of object relationships within a flow. In this example, the first four imported datasets feed into wrangled dataset 3 (W-Dataset 3), from which are generated the final outputs. Result 4 is generated from a single imported dataset/wrangled dataset combination.

Figure: Flow Example

DatasetsDescription
W-Dataset 1

Results of the job are used to create a new imported dataset (I-Dataset 2).

W-Dataset 2I-Dataset 2 is added directly to W-Dataset 2 (wrangled dataset 2) through the Transformer Page. See Transformer Page.
W-Dataset 3

W-Dataset 2 is included in W-Dataset 3. This step could be a join, union, or similar data blending statement in Recipe 3.

W-Dataset 4Although stored in the same flow, W-Dataset 4 is independent of the other datasets.

Flows are created in the Flows page. See Flows Page.

Your Rating: Results: PatheticBadOKGoodOutstanding! 5 rates

This page has no comments.