As needed, you can deploy custom data types into the Trifacta® platform, in which type validation is performed against regular expressions that you specify. This method is most useful for validating against patterns, as opposed to specific values.
- If your custom data type contains a pre-defined set of values, you can create the custom type using a dictionary file for validation. See Create Custom Data Types.
Custom Types Location
On the server hosting the Trifacta platform, type definitions such as dictionaries and custom data types are stored in the following directory:
This directory is referenced as
$CUSTOM_TYPE_DIR in the steps below.
Before you begin creating custom data types, you should backup the
type-packs/trifacta directory to a location outside of your Trifacta deployment.
trifacta-extras directory in the
type-packs directory contains experimental custom data types. These data types are not officially supported. Please use with caution.
dictionariessub-directory contains user-defined dictionaries.
NOTE: Please use the user interface to interact with your dictionaries. See Custom Type Dialog.
typessub-directory contains individual custom data type definitions, each in a separate file.
manifest.jsonfile contains a JSON manifest of all of the custom dictionaries and types in the system.
Each custom data type is created and stored in a separate file. The following example file contains a regular expression method for validating data against the set of days of the week:
Internal identifier for the custom type. Must be unique across all standard types and custom types.
NOTE: You should verify that your data type's
|Display name for the custom type.|
The category to assign to the type. The current categories are displayed within the data type drop-down for each column.
|Assign a default probability for the custom type. See below.|
|This block contains the regular expression specification to be applied to the column values.|
|When set to |
This array contains a set of regular expressions that are used to validate the column values. For a regex type, the column value must match with at least one value among the set of expressions.
NOTE: All match types must be double-escaped in the regex expression. For example, to replicate the
|(optional) Assign an incremental change to the probability when a match is found between a value and one of the regular expressions. See Defining probabilities below.|
Tip: In the
types sub-directory, you can review the regex-based types that are provided with the Trifacta platform. While you should not edit these files directly, they may provide some guidance and some regex tips on how to configure your own custom data types.
For your custom type, the probability values are used to determine the likelihood that matching values indicate that the entire column is of the custom data type.
defaultProbabilityvalue specifies the baseline probability that a match between a value and one of the regular expressions indicates that the column is the specified type. On a logarithmic scale, values are typically 1E-15 to 1E-20.
- When a value is matched to one of the regular expressions, the
probabilityvalue is used to increment the baseline probability that the next matching value is of the specified type. This value should also be expressed on a logarithmic scale (e.g.
- In this manner, a higher number of matching values increases the probability that the type is also a match to the custom type.
Probabilities become important primarily if you are creating a custom type that is a subset of an existing type. For example, the Email Address custom type is a subset of String type. So, matches for the patterns expressed in the Email Address definition should register a higher
probability value than the same incremental for the String type definition.
Tip: For custom types that are subsets of other, non-String types, you should lower the
defaultProbability of the baseline type by a factor of 10 (e.g. 1E-15 to 1E-16) and raise the same probability in the custom type by a factor of 10 (e.g. 1E-14). In this manner, you can give higher probability of matching to these subset types.
Add custom types to manifest
$CUSTOM_TYPE_DIR/manifest.json file, you must add the filenames of any custom types that you have created and stored in the
Enable custom types
To enable use of your custom data types in the Trifacta platform, locate and edit
NOTE: Add your entries to the items that are already present in
enabledSemanticTypes. Do not delete and replace entries.
<CustomTypeName1>corresponds to the internal
namevalue for your custom data type.
Register your custom types
To add your custom types to the Trifacta platform, run the following command from the js-data directory:
Restart services. See Start and Stop the Platform.
Check for the availability of your types in the column drop-down. See Create Custom Data Types.
This page has no comments.