Page tree

Limited Availability: If you are interested in installing or upgrading to this release of Trifacta Self-Managed Enterprise Edition, please contact Alteryx Support.



This section describes how Trifacta® Self-Managed Enterprise Edition manages character encoding on import, within the application, and on export.

Overview of Character Encoding

Character encoding refers to the mechanism by which numeric digital data is used to represent characters, including alphanumeric characters and punctuation, in languages around the world.  To ensure that different machines can represent the same thing on-screen, each machine can reference one or more of the supported file encoding types, which are standards for representation of characters. For example, a machine in the United Kingdom will represent the letter "A" sent from a machine in the United States if they are using the same encoding file encoding types. 

In many languages around the world, the representation of all characters requires hundreds and even thousands of characters. As a result, encodings for these regions may require a larger number of bits to represent all aspects of the language.

The platform supports a global file encoding type. By default, this encoding type is UTF-8. For more information, see Configure Global File Encoding Type

Character Encoding on Input

By default,  Trifacta Self-Managed Enterprise Edition supports UTF-8 on input. As needed, individual users can change the file encoding of input files. For example, a file that is ingested with a double-byte encoding can be identified as such for the product in the file settings during import, so that the data can be properly parsed during input.

Character Encoding within the Application

Within the Trifacta application, you can use the following functions to modify character encodings:

BASE64ENCODE Function Converts an input value to base64 encoding with optional padding with an equals sign (=). Input can be of any type. Output type is String.
BASE64DECODE Function Converts an input base64 value to text. Output type is String.
UNICODE Function Generates the Unicode index value for the first character of the input string.

Character Encoding on Output

All files are published with UTF-8 encoding.

This page has no comments.