Contents:
NOTE: The Trifacta® data types listed in this page reflect the raw data type of the converted column. Depending on the contents of the column, the Transformer Page may re-infer a different data type, when a dataset using this type of source is loaded.
Access/Read
When a BigQuery data type is imported, its data type is remapped according to the following table.
Tip: Data precision may be lost during conversion. You may want to generate min and max values and compute significant digits for values in your BigQuery tables and then compute the same in Dataprep by Trifacta.
Source Data Type | Supported? | Trifacta Data Type | Notes |
---|---|---|---|
String | Y | String | |
Bytes | Y | String | |
Integer | Y | Integer | |
Float | Y | Decimal | |
Boolean | Y | Boolean | |
Timestamp | Y | Datetime | Millisecond and microsecond precision in BigQuery is dropped during the import process. |
Date | Y | Datetime | If imported data contains microsecond precision, values are handled differently depending on the area of the product:
|
Time | Y | Datetime | |
Datetime | Y | Datetime | |
Array | Y | Array | |
Record | Y | Object |
Publish/Write
NOTE: Trifacta data types not listed here are written as string types in the target database.
Trifacta Data Type | BigQuery Data Type | Notes |
---|---|---|
String | String | |
Integer | Integer | |
Decimal | Float | |
Boolean | Boolean | |
Datetime | See below. | |
Datetime (time data only) | See below. | |
Array | Array for primitive data typesString for other data types | See below. |
Object | String |
Publishing Datetime values to BigQuery
NOTE: Publishing of microseconds to BigQuery is not supported.
Depending on several factors, Datetime values in Dataprep by Trifacta are published to BigQuery in different ways. Factors include:
- Method of publication:
- create/drop, which creates a new table
- append/truncate/merge, which updates an existing table
- Date format of the Trifacta source
NOTE: The following publishing behaviors assume that the default behavior of strict rules for type matching has been enabled on the publishing action for the target BigQuery table. Otherwise, Datetime values are always published as Datetime. For more information on strict type matching, see BigQuery Table Settings.
Datetime for creating new tables
Trifacta Datetime value | BigQuery target table type | Create and drop operations |
---|---|---|
Datetime with date information only | Date | Date |
Datetime with time information only | Time | Time |
Datetime | Datetime | Datetime |
Datetime with timestamp information only | Timestamp | Timestamp |
Datetime with date information only and no timezone | Timestamp (no timezone) | Timestamp appends 1970-01-01 |
Datetime for updating existing tables
For append, truncate, and merge operations, these updates depend on the target table type in BigQuery
Trifacta Datetime value | Date table type | Time table type | Datetime table type | Timestamp table type |
---|---|---|---|---|
Datetime with date information only | Date | INVALID | Appends 00:00:00 | Appends UTC, 00:00:00 |
Datetime with time information only | INVALID | Time | Appends 1970-01-01 | Appends UTC, 1970-01-01 |
Datetime | Truncates Time info | Truncates Date info | Datetime | Appends UTC |
Datetime with timestamp information only | Truncates Time and Timezone info | Truncates Date and Timezone info | Truncates Timezone info | Timestamp |
Datetime with date information only and no timezone | Timestamp appends 1970-01-01 | Truncates Timezone | Appends Date and truncates Timezone info | Appends UTC, 1970-01-01 |
Publishing limitations for Trifacta Photon:
When publishing to BigQuery for a job executed on Trifacta Photon, the following limitations apply:
- A staging CSV file is created for the job, and publication to BigQuery happens through a separate, internal Publish job. Since a CSV has no declared schema, the schema must be retrieved from the target table or tables, where possible.
- When writing to multiple BigQuery target tables:
- The Trifacta application assumes that the schema of all tables is the same.
- The Trifacta application selects one table to represent the schema for all, including representative datetime formats.
- If a target table does not exist, then Dataprep by Trifacta typecasting is applied:
- For create and drop actions, a new table is creating using Dataprep by Trifacta typecasting.
- For append, truncate, and merge actions, the table may or may not exist.
Supported Datetime formats
Dataprep by Trifacta supports the following Datetime formats for writing dates to BigQuery:
yy-dd-mm yy-mm-dd dd-mm-yy mm-dd-yy dd-mm mm-dd mm-yy
For the last three formats in the above list, additional information is written to BigQuery, where appropriate:
Format | Written as |
---|---|
dd-mm | 1970-dd-mm |
mm-dd | 1970-mm-dd |
mm-yy | yy-mm-01 |
Publishing errors
Strict data type matching is enabled by default when publishing to BigQuery. When strictly matching, job runs result in the following:
Big Query data type matching | Result | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Publishing Array type columns to BigQueryYou can publish Dataprep by Trifacta array type column in BigQuery as an array type. For more information, see BigQuery Table Settings. NOTE: Dataprep by Trifacta array type columns in datasets that were imported before Release 9.2 are still published as String type. This is a known issue. As a workaround, you can create a new imported dataset from the same source to publish those columns as BigQuery arrays. In BigQuery, an array is defined by the field Type along with the Mode. The following is the schema of array columns in BigQuery:
You can publish the following primitive type columns as Arrays in BigQuery:
Other complex types of Arrays are published as String values. INVALID Trifacta type matching with BigQuery data type | Error | |||||||||||||||
Data is appended or truncated on publication | Warning | |||||||||||||||
Data is written as-is to BigQuery | Valid |
This page has no comments.