This section provides an overview of how data types are managed during import, transformation, and export of data in .

Terms:

data type is a definition of a set of values, including:

A data format is a representation of the underlying type, which has the same meaning and available operations associated with the data type. For example, the following values are all valid for Datetime data type, but each is represented in a different format:

12/31/2021
31-12-2021
December 31, 2021


NOTE: Some data types can be explicitly formatted through functions. Other data types support different formats implicitly through the application.

How Data Types Are Used

In the , data types are used for the following:

Data Types

 supports the following categories of data types:

Logical data types

logical data type is a class of values that is understood by native system representations.

Tip: These types are recognized internally by . Each running environment to which connections natively supports these logical data types.

These data types have no additional specification requirements:

Data TypeDescription
String Data Type

Integer Data Type

Decimal Data Type

Boolean Data Type

Datetime Data Type

Object Data Type

Array Data Type

Formatting differences may apply. For example,  may recognize Y and N as Boolean data type, while other systems may not.

Complex data types

complex data type typically is defined by applying additional restrictions on String data type values to define the class of possible values. For example,  supports a Gender data type, which validates values such as MF and male and female as Gender data type. 

The following are the complex data types supported by .

Data TypeDescription
Social Security Number Data Type

Phone Number Data Type

Email Address Data Type

Credit Card Data Type

Gender Data Type

Zip Code Data Type

State Data Type

IP Address Data Type

URL Data Type

HTTP Code Data Type

Complex data types are typically defined based on a regular expression applied to String values. For example, the following regex is used to define the accepted values for the Email Address data type:

      "regexes": [
        "^[a-z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-z0-9.-]+(?:\\.[a-z0-9-]+)*\\.[a-z]{2,}$"
      ],


Logical and complex data types

Data typeCategoryInternal data typeNotes
String Data TypelogicalStringThe default data type. Any non-empty/non-value is valid for String data type.
Integer Data TypelogicalIntUse NUMFORMAT Function to format these values. Underlying values are not modified.
Decimal Data TypelogicalFloatUse NUMFORMAT Function to format these values. Underlying values are not modified.
Boolean Data TypelogicalBool
Datetime Data TypelogicalDatetimeUse DATEFORMAT Function to format these values. Underlying values are not modified.
Object Data TypelogicalMap/Object
Array Data TypelogicalArray
Social Security Number Data Type

complex

StringString data type constrained by a regular expression.
Phone Number Data Type
complex
StringString data type constrained by a regular expression.
Email Address Data Type
complex
StringString data type constrained by a regular expression.
Credit Card Data Type

complex

StringString data type constrained by a regular expression.
Gender Data Type
complex
StringString data type constrained by a regular expression.
Zip Code Data Type
complex
StringString data type constrained by a regular expression.
State Data Type
complex
StringString data type constrained by a regular expression.
IP Address Data Type
complex
StringString data type constrained by a regular expression.
URL Data Type
complex
StringString data type constrained by a regular expression.
HTTP Code Data Type
complex
StringString data type constrained by a regular expression.


Types in Source Data

Depending on the source system, imported data may be typed to  according to one of the following methods. 

Tip: For each method,  attempts to map the source data to one of the above data types. For schematized sources, however, you may wish to use the original data types in the source. Optionally, you can choose to disable the mapping of source to internal data type. See "Type Inference" below.

Schematized files

Some file formats, such as Avro or Parquet, are stored in a non-readable format. Part of the metadata associated with the file is information identifying the schema of the file. A schema represents the data types and other constraints of individual columns. It can be read independently of the data in the source table.  

Schematized tables

For relational sources, schema information is typically stored with the table. This schema information defines data type validation within the datastore and can be read independently from the source table.

Tip: Database schemas can be used to define a class of tables to ensure consistency within a database.

Inferred data types

In most cases, an imported data source is assigned a data type for each column based on a review of a subset of the data. For example, a CSV file contains no information about the data types of individual columns. The data types for each column must be assigned by . This process is called type inference. For more information, see "Type Inference" below.

Type Inference

By default, the  applies type inference for imported data. The application attempts to infer a column's appropriate data type in the application based on a review of the first lines in the sample.

Tip: In many programming languages, a column must be explicitly "type cast" to a data type as part of a functional operation. handles this typecasting for you through the process of type inference.


NOTE: Mapping source data types to depends on a sufficient number of values that match the criteria of the internal data type. The mapping of import types to internal data types depends on the data.

In some datasets, the first 25 rows may be of a data type that is a subset of the best matching type. For example, if the first 25 rows in the initial same match the Integer data type, the column may be typed as Integer, even if the other 2,000 rows match for the Decimal data type. If the column data type is unmodified:

In this case, it may be easier to disable type inference for this dataset.

Tip: If you are having trouble getting your imported dataset to map to expected data types, you can disable type inference for the individual dataset. For more information, see Import Data Page.

After data is imported, the  provides some mechanisms for applying stronger typecasting to the data. Example:

Tip: The default data type is String. If the is unable to evaluate a column's data type, the type is mapped to String data type. Within the application, you can use functions to remap the data type or to parse values according to a specified type.

For more information, see "Working with Data Types" below.

Disable type inference

For schematized files or tables, the  inference of data type from the source may result in incorrect initial typing of a dataset's columns in the application. As needed, you can disable type inference for the following:

NOTE: When type inference is disabled for imported datasets, it is not disabled within the . For more information, see "Type inference in the application" below.




Type inference in the application

Within the , column data types may be re-inferred based on your activities in the Transformer page: 

NOTE: Disabling type inference does not disable the re-inference of types in the Transformer page.

The following general actions may result in column data types being re-inferred:

Working with Data Types

After data has been imported, you can remap individual column types through recipe steps. For more information, see Change Column Data Type.

Data types in the grid

When a sample is loaded, the data types and their formats for each column are inferred by default. Data types and formatting information is displayed for each column in the Transformer page.


Column header example

At the top of each column, you can review graphical representations of type information:

For more information, see Data Grid Panel

For more information, see Column Menus.

Changing the data type

Change the data type through the data type menu at the top of a column.

Tip: For some types, such as Datetime type, you must select a data format when you are selecting the type. See below.

NOTE: Changing the data type may not change the underlying logical type. For example, if you change a String column to a Gender column, the underlying data is still stored as String values.

See Change Column Data Type.

Changing type across multiple columns

To change the data type for multiple columns, you can a transformation similar to the following, which changes the data type from the reqProdId column to the prodC column and all columns in between:


Changing the data format

You can use the following functions to apply formatting on top of a column of a specified data type.  For example, depending on your locale, numbers may require different formatting for use of the decimal point and the digit separator.

NOTE: When you apply a formatting function to a column, the data appears in the specified format in the , but the underlying data is unmodified. Formatting changes appear as a step in your recipe and are applied to the generated results.


Formatting FunctionApplicable Data TypeDescription
NUMFORMAT FunctionInteger, Decimal

DATEFORMAT FunctionDatetime

UNIXTIMEFORMAT FunctionDatetime

Type functions

The  provides a set of functions for managing types. 

Validation functions

These functions can be used to test for valid or invalid values against a specific data type.

FunctionDescription
VALID Function

ISMISMATCHED Function

IFVALID Function

IFMISMATCHED Function


Parsing functions

These functions can be used to parse String values against a specific data type.

FunctionDescription
PARSEINT Function

PARSEFLOAT Function

PARSEBOOL Function

PARSEDATE Function

PARSEARRAY Function

PARSEOBJECT Function

PARSESTRING Function

Managing null and empty values

These functions allow you to generate or test for missing or null values.

FunctionDescription
ISNULL Function

NULL Function

IFNULL Function

ISMISSING Function

IFMISSING Function

Type Conversions on Export

The  attempts to map the data types that you have specified to match data types in the target platform.

NOTE: Values that do not match the data type of the target system for a column are subject to the method by which the target system handles mismatches. Rows could be dropped. Values can be rendered as null values. You should attempt to verify that all columns have valid values before generating results.


NOTE: Missing or null values may be treated differently between target systems. Additionally, if these systems feed downstream systems, those systems may have independent rules for managing missing or null values.

For more information, see Type Conversions.