Page tree

NOTE:  Trifacta Wrangler is a free product with limitations on its features. Some features in the documentation do not apply to this product edition. See Product Limitations.

   

Contents:


NOTE: This feature is in Beta release.

When you create or modify an imported dataset, you can configure various aspects of how the data is interpreted by the Trifacta® application.

During import, click Edit Settings from the card for an individual dataset in the right panel.

NOTE: In some cases, you may be required to refresh the schema from the source before editing settings.

Figure: Dataset Settings

After import, select Settings from the context menu for the imported dataset in:

Prerequisites

A workspace administrator must enable this feature in your workspace. 

Limitations

  • Converted file types such as JSON, Excel, and PDF, are not supported for this configuration workflow. For those file types, see File Import Settings.
  • You cannot select or deselect columns to import, which will be available in a future release.

File Settings

The following settings apply to file-based datasources.

Structuring settings

By default,  Trifacta Wrangler attempts to interpret the structure of your data during import. This structuring attempts to apply an initial tabular structure to the dataset.

  • Unless you have specific problems with the initial structure, you should leave the Detect structure setting enabled. Recipes created from these imported datasets automatically include the structuring as the first, hidden steps. These steps are not available for editing, although you can remove them through the Recipe panel. See Recipe Panel.
  • When detecting structure is disabled, imported datasets whose schema has not been detected are labeled, unstructured datasets. When recipes are created for these unstructured datasets, the structuring steps are added into the recipe and can be edited as needed.

The structuring settings enable you to interpret the structure of the dataset during the import:

  • Automatic: By default, the  Trifacta® Wrangler automatically detects the structuring settings of your dataset during import.
  • None: If you select this option, the dataset is unstructured.

    NOTE: The Header and Encoding drop-down lists are not be available if you select the None option.

Tip: You can use the Preview tab to view the structure of the selected dataset.

For more information, see Initial Parsing Steps.

Header

You can apply the column headers to your datasets during import. Select the required option from the drop-down list:

  • Infer header: (default) When selected, the Trifacta application infers the header based on the data in the import. 
  • Use first row as header: When selected, the first row is used as the column headers.

  • No header: When selected, the inference is ignored and column headers are defined using generic names with no headers.

If replacing a file: 

  • If you replace a dataset in a flow and select the Use first row as header option, then the existing header row labels are updated with the new headers.
  • Subsequent steps in a pre-existing recipe may be broken if the headers are changed by a replaced file.

Tip: After the dataset is imported, you can rename columns manually or using any row in the dataset. For more information, see Rename Columns.

  1. From the Import Data page, click Edit settings from the card of an individual dataset in the right panel.
  2. From the Data settings dialog box, select the required options and modify the settings.
  3. To save the changes, click Save.

Encoding

By default,  Trifacta Wrangler applies a specified encoding type on the imported the file. In some cases, the data preview panel may contain garbled data, due to a mismatch in encodings. In the Data Preview dialog, you can select a different encoding for the file. When the correct encoding is selected, the preview displays the data as expected. 

NOTE: Assessing the file encoding type based on parsing an input file is not an accurate method. Instead, Trifacta Wrangler assumes that the file is encoded in the default encoding. If it is not, you should change the encoding type for the file.

NOTE: In some cases, imported files are not properly parsed due to issues with encryption types or encryption keys in the source datastore. For more information, please contact your datastore administrator.

For a list of supported encoding types, see Supported File Encoding Types.

Table Settings

The following settings apply to table-based data imported from a database.

Infer data types

You can select whether or not to apply Trifacta type inference to tables data imported from a database. In the Preview tab, you can see the data type that is to be applied after the dataset is imported. This data type may change depending on whether column data type inference is selected or not selected for the dataset.

To enable Trifacta type inference, select Infer data type from the drop-down box.

Schema Tab

The Schema tab displays the number of columns in the current dataset. By default, all columns are selected. 

  • Search columns: You can search for any preferred columns.

Columns:

  • Column: Displays the list of columns in the dataset.

  • Sample value: A sample value of the corresponding column is displayed.

Special characters in column names

When selected, characters in column names that are not alphanumeric or underscores are stripped, and space characters are converted to underscores.

  • Remove special characters: If you select this option, the special characters are removed during the import. For example, characters that are not alphanumeric or underscores are stripped, and space characters are converted to underscores.
  • Keep special characters: By default, the special characters, such as alphanumeric, underscore, and space characters, are retained during the import.

For more information, see Sanitize Column Names.

Rename the column

You can rename the column by clicking the column name and edit accordingly. After you rename the column, the previous column name is struck off and displayed next to the new column name.

Change the data type

You can change the data type of a column by clicking the caret next to the column name.

NOTE: You can revert both the name and data type changes by clicking the undo icon in the corresponding column.

To save the changes, click Save.

Preview Tab

The Preview tab displays a snapshot of the dataset details.

Tip: You can use the Preview tab whenever you select any option from the drop-down list.


This page has no comments.