Page tree

 

Contents:


The Trifacta® platform supports a single global file encoding type, which is set to UTF-8 by default. This file encoding type applies to all text files for the following operations:

  • Loading the default sample and any subsequent random samples
  • Running text-based jobs on the Trifacta Server or in Hadoop

NOTE: This setting applies only to text files. Binary types, such as Avro, are not affected by the global file encoding type.

NOTE: If you change this setting, datasets that were imported under the former encoding type are no longer valid. Instructions are provided below for updating them.

Configure Global File Encoding Type

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  1. Set the following parameter to the appropriate file encoding type:

    "inputFileEncoding": "UTF-8",
  2. Save your changes and restart the platform.

NOTE: After you change the global encoding type, datasets that were imported under the old encoding type must be reloaded to the platform. For more information, see Update Sources.

Supported Global File Encoding Types for Input

  • UTF-8 (default)

  • IBM00858

  • IBM437

  • IBM775

  • IBM850

  • IBM852

  • IBM855

  • IBM857

  • IBM862

  • IBM866

  • ISO-8859-1

  • ISO-8859-2

  • ISO-8859-3

  • ISO-8859-4

  • ISO-8859-5

  • ISO-8859-6

  • ISO-8859-7

  • ISO-8859-8

  • ISO-8859-9

  • ISO-8859-13

  • ISO-8859-15

  • KOI8-R

  • KOI8-U

  • US-ASCII

  • UTF-16

  • UTF-16BE

  • UTF-16LE

  • UTF-32

  • UTF-32BE

  • UTF-32LE

  • windows-1250

  • windows-1251

  • windows-1252

  • windows-1253

  • windows-1254

  • windows-1255

  • windows-1256

  • windows-1257

  • x-IBM737

  • x-IBM874

  • x-UTF-16LE-BOM

Validate

After you have changed the global file encoding type, restart services. See Start and Stop the Platform.

You should try to create a dataset from source data of the selected encoding type.

Update Sources

After you have changed the global encoding type, datasets that were imported under the former encoding type are no longer valid. 

Steps:

  1. For each dataset imported under the old encoding type, upload a new version.
  2. For each recipe that used the old version of the imported dataset:
    1. Edit the recipe in the Transformer Page. 
    2. Swap the source from the old version to the new one. For more information, see Flow View Page.
  3. Repeat for each imported and recipe combination. 

Supported Global File Encoding Types for Output

Output files are written in UTF-8 encoding.

This page has no comments.