Page tree

Trifacta Dataprep



Contents:

   

Contents:


Feature Availability: This feature is available in the following editions:

  • Dataprep Enterprise Edition by Trifacta
  • Dataprep Professional Edition by Trifacta
  • Dataprep Starter Edition by Trifacta
  • Dataprep Premium by Trifacta
  • Dataprep Standard by Trifacta

BigQuery is a scalable cloud data warehouse integrated with the Google Cloud Platform for storage of a wide range of datasets. In some use cases, your transformation jobs can be executed completely in BigQuery. If all of your source datasets and outputs are in BigQuery locations, then transferring the execution steps from the Trifacta node to BigQuery yields the following benefits:

  • A minimum of data (recipe steps and associated metadata) is transferred between systems. Everything else remains in BigQuery.
  • Recipe steps are converted into SQL that is understandable and native to BigQuery. Execution times are much faster.
  • Depending on your environment, total cost of executing the job may be lower in BigQuery.

In this scenario, the recipe steps are converted to SQL, which is sequentially executed on the Datasets and Tables of your source data into temporary tables, from which the results that you have defined for your output are written.

Tip: When running a job in BigQuery, your data never leaves BigQuery.

Tip: For jobs that are executed in BigQuery, you can optionally enable the execution of the visual profile in BigQuery, too. This option is enabled for individual flows. For more information, see Flow Optimization Settings Dialog.

Requirements

  • This feature must be enabled by the project owner. See Configure Running Environments.
  • In your flow, you must enable all general and BigQuery-specific flow optimizations. When all of these optimizations are enabled, the job can be pushed down to BigQuery for execution. For more information, see Flow Optimization Settings Dialog.

If the requirements and limitations are met, the Trifacta application automatically executes the job in BigQuery.

Limitations

BigQuery as a running environment requires that pushdowns be enabled for the project and for the specific flow for which the job is executed. If the flow and the project are properly configured, the job is automatically executed in BigQuery.

NOTE: BigQuery is not a running environment that you explicitly select or specify as part of a job. If all of the requirements are met, then the job is executed in BigQuery when you select Dataflow.


  • All datasources and all outputs specified in a job must be located within BigQuery.
  • Dataflow must be selected as running environment.
  • Custom SQL datasets are not supported.
  • All recipe steps, including all Wrangle functions in the recipe, must be translatable to SQL. 

    NOTE: When attempting to execute a job in BigQuery, Trifacta application executes each recipe in BigQuery, until it reaches a step that cannot be executed there. At that point, data is transferred to Dataflow, where the remainder of the job is executed.

  • Some transformations and functions are not currently supported for execution in BigQuery. See below.
  • Upserts, merges, and deletes are not supported for full execution in BigQuery.
  • Sampling jobs are not supported for execution in BigQuery.
  • If your recipe includes data quality rules, the job cannot be fully executed in BigQuery.

Unsupported  Wrangle  for BigQuery Execution

The following transformations and functions are not currently supported for execution in BigQuery. 

NOTE: If your recipe contains any of the following transformations or functions, full job execution in BigQuery is not possible at this time. These transformations are expected to be supported and removed from this list in future releases.

General limitations

  • Regex patterns used must be valid RE2. Operations on non-RE2 regex patterns are not pushed down.
  • For more information on limitations on specific push-downs, see Flow Optimization Settings Dialog.

Unsupported data types

The following data types are not supported for execution in BigQuery.

  • Arrays
  • Objects (Maps)

Unsupported transformations

The following transformations are not supported for execution in BigQuery.

Legend:

  • Search term: the value you enter in the Transform Builder
  • Transform: name of the underlying transform
Search termTransform
Unnest elementsunnest
Expand Array to rowsflatten
Extract between delimitersextractbetweendelimiters
Unpivotunpivot
Standardize columnstandardize
Nest columnsnest
Extract matches to Arrayextractlist
Replace between delimitersreplacebetweenpatterns
Scale to min maxscaleminmax
Scale to meanscalestandardize
Convert key/value to Objectextractkv

For more information, see Transformation Reference.

Unsupported functions

The following  Wrangle  functions are not currently supported for execution in BigQuery.

Aggregate functions

KTHLARGEST
KTHLARGESTIF
KTHLARGESTUNIQUE
KTHLARGESTUNIQUEIF
LIST
LISTIF
MODE
MODEIF
UNIQUE
QUARTILE
APPROXIMATEMEDIAN
APPROXIMATEPERCENTILE
APPROXIMATEQUARTILE

For more information, see Aggregate Functions.

Math functions

LCM

Partially supported:

NUMFORMAT: Only supported when used for rounding.

For more information, see Math Functions.

Date functions

WEEKNUM
NETWORKDAYS
NETWORKDAYSINTL
MODEDATE
WORKDAY
WORKDAYINTL
CONVERTFROMUTC
CONVERTTOUTC
CONVERTTIMEZONE
MODEDATEIF
KTHLARGESTDATE
KTHLARGESTUNIQUEDATE
KTHLARGESTUNIQUEDATEIF
KTHLARGESTDATEIF
EOMONTH
SERIALNUMBER

Partially supported:

DATEDIF: Only day, hour, minute, second and millisecond are supported as units.

For more information, see Date Functions.

String functions

SUBSTITUTE
PROPER
REMOVESYMBOLS
RIGHTFIND
EXACT
STRINGGREATERTHAN
STRINGGREATERTHANEQUAL
STRINGLESSTHAN
STRINGLESSTHANEQUAL
DOUBLEMETAPHONE
DOUBLEMETAPHONEEQUALS
TRANSLITERATE

For more information, see String Functions.

Nested functions

ARRAYCONCAT
ARRAYCROSS
ARRAYINTERSECT
ARRAYLEN
ARRAYSTOMAP
ARRAYUNIQUE
ARRAYZIP
FILTEROBJECT
KEYS
ARRAYELEMENTAT
LISTAVERAGE
LISTMAX
LISTMIN
LISTMODE
LISTSTDEV
LISTSUM
LISTVAR
ARRAYSORT
ARRAYINDEXOF
ARRAYMERGEELEMENTS
ARRAYRIGHTINDEXOF
ARRAYSLICE

For more information, see Nested Functions.

Window functions

SESSION

For more information, see Window Functions.

Other functions

IPTOINT
IPFROMINT

For more information, see Other Functions.

This page has no comments.