By default, the Trifacta® platform applies its own type inference to datasets when they are imported and again when new steps are applied to the data. This section provides information on how you can configure where type inference is applied in the platform.
Data types are inferred by the Trifacta platform when:
- Imported datasets are originally loaded.
- A new transformation step is added in a recipe.
- Non-inferred types are imported as String type.
Tip: You can use the Change Column Type transformation to override the data type inferred for a column. However, if a new transformation step is added, the column data type is re-inferred, which may override your specific typing. You should consider applying Change Column Type transformations as late as possible in your recipes.
For more information on how the Trifacta platform applies data types to specific sources of data on import, see Type Conversions.
Configure Type Inference for Schematized Sources
Optionally, you can choose to disable type inference for schematized sources. A schematized source includes column data type information as part of the object definition. The following schematized sources are supported for import into the Trifacta platform:
All JDBC sources
NOTE: You cannot disable type inference for Oracle sources. This is a known issue.
- Avro file format
|Type inference on schematized sources||Setting||Behavior|
All imported datasets from schematized sources are automatically inferred by the type system in the Trifacta platform.
The inferred data types may be different from those in the source. When the dataset is loaded, data types can be applied to individual columns through the application.
Users can apply overrides for:
For schematized data sources, type inference is not automatically inferred by Trifacta platform.
Data type information is taken from the source schema and applied where applicable to the dataset. If there is no corresponding data type in the Trifacta platform, the data is imported as String type.
Users can apply overrides for:
Please perform the following configuration change to disable type inference of schematized sources at the global level.
- You can apply this change through the Admin Settings Page (recommended) or
trifacta-conf.json. For more information, see Platform Configuration Methods.
Change the following configuration setting to
- Save your changes.
In the application, type inference can be applied to your imported data through the following mechanisms.
Define for individual connections
You can specify individual connections to apply or not apply Trifacta type inference when the connection is created or edited.
NOTE: When Default Column Data Type Inference is disabled for an individual connection, Trifacta type inference can still be applied on import of individual datasets.
For more information, see Create Connection Window.
Specify on dataset import
When type inference has been disabled globally for schematized sources, you can choose to enable or disable it for individual source import.
Tip: To compare how data types are imported from the schematized source or when applied by the Trifacta platform, you can import the same schematized source twice. The first instance of the source can be imported with type inference enabled, and the second can be imported with it disabled.
In the Import Data page, click Edit Settings on the data source card.
For more information, see Import Data Page.
Configure Type Inference in the Data Grid
Type inference is automatically enabled in the data grid. It cannot be disabled.
When a new transformation step is applied, each column is re-inferred for its Trifacta data type.
Type Inference on Export
When you generate results, the current data types in the data grid are applied to the generated results.
If the publishing destination is a schematized environment, the generated results are written to the target environment based on the environment type. These data type mappings cannot be modified.
For more information on output types, see Type Conversions.
This page has no comments.