Page tree

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 17 Next »

Trifacta Dataprep



Contents:

   

Contents:


Import

When data is imported:

  • Supported data types from the source are converted to corresponding data types supported by the application, based upon the conversions listed in this section.
  • Types that are not supported but are recognized by the application are converted to String types.
  • Data for types that cannot be read from the source due to technical reasons are converted to null values on import.

Type Inference

By default, the Trifacta application applies type inference for imported data. The application attempts to infer a column's appropriate data type in the application based on a review of the first lines in the sample.

NOTE: Mapping source data types to Trifacta data types depends on a sufficient number of values that match the criteria of the internal data type. The mapping of import types to internal data types depends on the data.

  • Type inference needs a minimum of 25 rows of data in a column to work consistently.

  • If your dataset has fewer than 20 rows, type inference may not have sufficient data to properly infer the column type.

In some datasets, the first 25 rows may be of a data type that is a subset of the best matching type. For example, if the first 25 rows in the initial same match the Integer data type, the column may be typed as Integer, even if the other 2,000 rows match for the Decimal data type. If the column data type is unmodified:

  • The data is written out from Dataprep by Trifacta as Integer data type. This works for the first 25 rows.
  • The other 2,000 rows are written out as null values, since they do not match the Integer data type. If the source data used decimal notation (e.g. 3.0 in the source), then those values are written out as null values, too.

In this case, it may be easier to disable type inference for this dataset. See below.

Tip: If you are having trouble getting your imported dataset to map to expected data types, you can disable type inference for the individual dataset. For more information, see Import Data Page.

Strong typecasting:

After data is imported, Trifacta application provides some mechanisms for applying stronger typecasting to the data. Example:

  • If all input values are double-quoted, then  Dataprep by Trifacta evaluates all columns as String type. As a result, type inference cannot be applied. 
  • Since non-String data types cannot be inferred, then the first row cannot be detected as anomalous against the inferred type (String). Column headers cannot be automatically detected from double-quoted source files.

Solutions:

  • After data has been imported, you can remap individual column types through recipe steps. For more information, see Change Column Data Type.
    In the preceding example, you can apply functions to the column data to parse the values as specific data types. You can parse String values as other data types using the following functions:

    FunctionDescription
    PARSEINT Function

    Evaluates a String input against the Integer datatype. If the input matches, the function outputs an Integer value. Input can be a literal, a column of values, or a function returning String values.

    PARSEFLOAT Function

    Evaluates a String input against the Decimal datatype. If the input matches, the function outputs a Decimal value. Input can be a literal, a column of values, or a function returning String values.

    PARSEBOOL Function

    Evaluates a String input against the Boolean datatype. If the input matches, the function outputs a Boolean value. Input can be a literal, a column of values, or a function returning String values.

    PARSEDATE Function

    Evaluates an input against the default input formats or (if specified) an array of Datetime format strings in their listed order. If the input matches one of the formats, the function outputs a Datetime value.

Export

On export from the Trifacta application:

  • The application maps the internal Trifacta data type to the explicit type listed in the appropriate page in this section.
  • Unmapped types are converted to the equivalent of strings.

Tip: You can import a target schema to assist in lining up your columns with the expected target. For more information, see Overview of RapidTarget.


For more information on the data types that are supported within the Trifacta application, see Supported Data Types.

  • No labels

This page has no comments.