As needed, you can deploy custom data types into the , in which type validation is performed against regular expressions that you specify. This method is most useful for validating against patterns, as opposed to specific values.
After a custom type has been added, it cannot be removed or disabled. Please verify your regular expression before saving the type. |
On the server hosting the , type definitions are stored in the following directory:
/opt/trifacta/node_modules/jsdata/type-packs/trifacta |
This directory is referenced as $CUSTOM_TYPE_DIR
in the steps below.
Before you begin creating custom data types, you should backup the |
NOTE: The |
Directory contents:
The dictionaries
sub-directory contains user-defined dictionaries.
NOTE: Please use the user interface to interact with your dictionaries. See Custom Type Dialog. |
The types
sub-directory contains individual custom data type definitions, each in a separate file.
manifest.json
file contains a JSON manifest of all of standard and custom types in the system.Each custom data type is created and stored in a separate file. The following example file contains a regular expression method for validating data against the set of days of the week:
{ "name": "DayOfWeek", "prettyName": "Day of Week", "category" : "Date/Time", "defaultProbability": 1E-15, "testCase": { "stripWhitespace": true, "regexes": [ "^(monday|tuesday|wednesday|thursday|friday|saturday|sunday)$", "^(mon|tue|wed|thu|fri|sat|sun)$" ], "probability": 0.001 } } |
Parameters:
Parameter Name | Description | |
---|---|---|
name | Internal identifier for the custom type. Must be unique across all standard types and custom types.
| |
prettyName | Display name for the custom type. | |
category | The category to assign to the type. The current categories are displayed within the data type drop-down for each column. | |
defaultProbability | Assign a default probability for the custom type. See below. | |
testCase | This block contains the regular expression specification to be applied to the column values. | |
stripWhitespace | When set to true , whitespace is removed from any value prior for purposes of validation. The original value is untouched. | |
regexes | This array contains a set of regular expressions that are used to validate the column values. For a regex type, the column value must match with at least one value among the set of expressions.
| |
probability | (optional) Assign an incremental change to the probability when a match is found between a value and one of the regular expressions. See Create Custom Data Types Using RegEx#Defining probabilities below. |
Tip: In the |
For your custom type, the probability values are used to determine the likelihood that matching values indicate that the entire column is of the custom data type.
defaultProbability
value specifies the baseline probability that a match between a value and one of the regular expressions indicates that the column is the specified type. On a logarithmic scale, values are typically 1E-15 to 1E-20.probability
value is used to increment the baseline probability that the next matching value is of the specified type. This value should also be expressed on a logarithmic scale (e.g. 0.001
).Probabilities become important primarily if you are creating a custom type that is a subset of an existing type. For example, the Email Address custom type is a subset of String type. So, matches for the patterns expressed in the Email Address definition should register a higher probability
value than the same incremental for the String type definition.
Tip: For custom types that are subsets of other, non-String types, you should lower the |
To the $CUSTOM_TYPE_DIR/manifest.json
file, you must add the filenames of any custom types that you have created and stored in the types
directory:
{ "types": ["bodies-of-water.json", "dayofweek.json"], "dictionaries": ["oceans", "seas"] } |
Steps:
Locate the following property:
"feature.enableCustomTypes": true, |
To enable use of your custom data types in the , locate and edit
enabledSemanticTypes
property.
NOTE: Add your entries to the items that are already present in |
NOTE: Do not use this parameter to attempt to remove specific data types. Removal of the default types is not supported. |
"webapp.enabledSemanticTypes": [ "<CustomTypeName1>", "<CustomTypeName2>", "<CustomTypeNameN>" ] |
where:
<CustomTypeName1>
corresponds to the internal name
value for your custom data type.To enable use of your custom data types in the , locate and edit
enabledSemanticTypes
property.
NOTE: Add your entries to the items that are already present in |
NOTE: Do not use this parameter to attempt to remove specific data types. Removal of the default types is not supported. |
"webapp.enabledSemanticTypes": [ "<CustomTypeName1>", "<CustomTypeName2>", "<CustomTypeNameN>" ] |
where:
<CustomTypeName1>
corresponds to the internal name
value for your custom data type.To add your custom types to the , run the following command from the js-data directory:
node bin/load-types --manifest ${PATH_TO_MANIFEST_FILE} |
Restart services. See Start and Stop the Platform.
Check for the availability of your types in the column drop-down. See Column Menus.