Page tree


Contents:

   

Contents:


This section covers data type conversions between the Dataprep by Trifacta application and BigQuery.

NOTE: The Dataprep by Trifacta data types listed in this page reflect the raw data type of the converted column. Depending on the contents of the column, the Transformer Page may re-infer a different data type, when a dataset using this type of source is loaded.

NOTE: JSON data stored in BigQuery is not supported for ingest or publication.

Access/Read

When a BigQuery data type is imported, its data type is remapped according to the following table.

Tip: Data precision may be lost during conversion. You may want to generate min and max values and compute significant digits for values in your BigQuery tables and then compute the same in Dataprep by Trifacta.

Source Data TypeSupported?

Dataprep by Trifacta Data Type

Notes
StringYString
BytesYString
IntegerYInteger
FloatYDecimal
BooleanYBoolean
TimestampYDatetimeMillisecond and microsecond precision in BigQuery is dropped during the import process.
DateYDatetime

If imported data contains microsecond precision, values are handled differently depending on the area of the product:

  • Initial sampling, import: Uses the BigQuery API, which can return microseconds.
  • All other areas of the product, including job execution: Uses the Dataflow SDK, which returns milliseconds.
TimeYDatetime
DatetimeYDatetime
ArrayYArray


RecordYObject

Date values:

When type inference is disabled, data for the following BigQuery types may be formatted according to the table:

Imported data typeFormatting
DATE
['Datetime', 'yy-mm-dd', 'yyyy*mm*dd']
TIME
['Datetime', 'hh:mm:ss', 'HH:MM:SS']
DATETIME
['Datetime', 'yy-mm-dd hh:mm:ss', 'yyyy*mm*dd*HH:MM:SS']
TIMESTAMP
['Datetime', 'yy-mm-dd hh:mm:ss', 'yyyy*mm*dd*HH:MM:SSX']


Publish/Write

NOTE: Dataprep by Trifacta data types not listed here are written as string types in the target database.

Dataprep by Trifacta Data Type
BigQuery Data TypeNotes
StringString
IntegerInteger
DecimalFloat
BooleanBoolean
Datetime
See below.
Datetime (time data only)
See below.
Array

Array for primitive data types

String for other data types

 See below.
ObjectRecord

Publishing errors

Strict data type matching is enabled by default when publishing to BigQuery. When strictly matching, job runs result in the following:

Big Query data type matchingResult

INVALID Dataprep by Trifacta type matching with BigQuery data type

Error
Data is appended or truncated on publicationWarning
Data is written as-is to BigQueryValid

Publishing Datetime type columns to BigQuery

NOTE: Publishing of microseconds to BigQuery is not supported.

Depending on several factors, Datetime values in  Dataprep by Trifacta are published to BigQuery in different ways. Factors include:

  • Method of publication:
    • create/drop, which creates a new table
    • append/truncate/merge, which updates an existing table
  • Date format of the Dataprep by Trifacta source

NOTE: The following publishing behaviors assume that the default behavior of strict rules for type matching has been enabled on the publishing action for the target BigQuery table. Otherwise, Datetime values are always published as Datetime. For more information on strict type matching, see BigQuery Table Settings.

Datetime for creating new tables

Dataprep by Trifacta Datetime value

BigQuery target table typeCreate and drop operations
Datetime with date information onlyDateDate
Datetime with time information onlyTimeTime
DatetimeDatetimeDatetime
Datetime with timestamp information onlyTimestampTimestamp
Datetime with date information only and no timezoneTimestamp (no timezone)Timestamp appends 1970-01-01

Datetime for updating existing tables

For append, truncate, and merge operations, these updates depend on the target table type in BigQuery

Dataprep by Trifacta Datetime value

Date table typeTime table typeDatetime table typeTimestamp table type
Datetime with date information onlyDateINVALIDAppends 00:00:00Appends UTC, 00:00:00
Datetime with time information onlyINVALIDTimeAppends 1970-01-01Appends UTC, 1970-01-01
DatetimeTruncates Time infoTruncates Date infoDatetimeAppends UTC
Datetime with timestamp information onlyTruncates Time and Timezone infoTruncates Date and Timezone infoTruncates Timezone infoTimestamp
Datetime with date information only and no timezoneTimestamp appends 1970-01-01

Truncates Timezone

Appends Date and truncates Timezone infoAppends UTC, 1970-01-01

Publishing limitations for Trifacta Photon:

When publishing to BigQuery for a job executed on Trifacta Photon, the following limitations apply:

  • A staging CSV file is created for the job, and publication to BigQuery happens through a separate, internal Publish job. Since a CSV has no declared schema, the schema must be retrieved from the target table or tables, where possible.
  • When writing to multiple BigQuery target tables: 
    • The  Dataprep by Trifacta application  assumes that the schema of all tables is the same.
    • The  Dataprep by Trifacta application  selects one table to represent the schema for all, including representative datetime formats.
    • If a target table does not exist, then  Dataprep by Trifacta typecasting is applied:
      • For create and drop actions, a new table is creating using  Dataprep by Trifacta typecasting.
      • For append, truncate, and merge actions, the table may or may not exist.

Supported Datetime formats

Dataprep by Trifacta supports the following Datetime formats for writing dates to BigQuery:

yy-dd-mm
yy-mm-dd
dd-mm-yy
mm-dd-yy
dd-mm
mm-dd
mm-yy

For the last three formats in the above list, additional information is written to BigQuery, where appropriate:

FormatWritten as
dd-mm1970-dd-mm
mm-dd1970-mm-dd
mm-yyyy-mm-01

Publishing Array type columns to BigQuery

NOTE: Publishing complex data types that are nested more than 15 levels is not supported by BigQuery.

You can publish Dataprep by Trifacta array type column in BigQuery as an array type. 

In BigQuery,  an array is defined by the field Type along with the Mode. The following is the schema of array columns in BigQuery. You can publish the following primitive type columns as Arrays in BigQuery: 

  • Integer

  • Float 

  • Boolean

  • String

Field NameTypeMode

int_array

INTEGERREPEATED
float_arrayFLOATREPEATED
string_arraySTRINGREPEATED
bool_arrayBOOLEANREPEATED

Publishing Objects and Array of Objects to BigQuery

NOTE: Publishing complex data types that are nested more than 15 levels is not supported by BigQuery.

You can publish Dataprep by Trifacta Objects and Arrays of Objects as complex types in BigQuery. 

Field NameTypeMode
objectsRECORDNULLABLE
array_objectsRECORDREPEATED

Nested Arrays can be published in native format as long as the following conditions are met:

  • Data is homogeneous. Heterogeneous objects are not supported by BigQuery. 
  • Nesting is not more than 15 levels deep.
Field NameTypeMode
array_arraysRECORDREPEATED

Limitations:

  • Pushdown execution on BigQuery for array of arrays are not supported. Please disable any flow optimizations before running jobs on nested data.

For more information, see BigQuery Table Settings.

This page has no comments.