Snowflake provides cloud-based data storage and analytics as a service. Among other infrastructures, Snowflake runs on Amazon S3. If all of your source datasets and outputs are in Snowflake locations and other conditions are met, then the entire execution of the transformations can occur in Snowflake.

Transferring the execution steps from the  to Snowflake yields the following benefits:

In this scenario, the recipe steps are converted to SQL, which is sequentially executed your source data in temporary tables, from which the results that you have defined for your output are written.

Tip: When running a job in Snowflake, your data never leaves Snowflake.

Tip: Execution on datasets created with custom SQL is supported.

If the requirements and limitations are met, the  automatically executes the job in Snowflake.

Requirements

General

For 

For customer-managed deployments, the following additional requirements apply:

Requirements across multiple Snowflake connections

If you are executing a job on Snowflake that utilizes multiple connections, the following requirements must also be met for execution of the job on Snowflake:

Limitations

Snowflake as a running environment requires that pushdowns be enabled for the workspace and for the specific flow for which the job is executed. If the flow and the workspace are properly configured, the job is automatically executed in Snowflake.

NOTE: Snowflake is not a running environment that you explicitly select or specify as part of a job. If all of the requirements are met, then the job is executed in Snowflake when you select EMR.

Enable

Workspace Settings

The following setting must be enabled in the workspace. Select User menu > Admin console > Workspace settings.

OptimizationDescription
Logical and physical optimization of jobs

When enabled, the attempts to optimize job execution through logical optimizations of your recipe and physical optimizations of your recipes interactions with data.

For more information, see Workspace Settings Page.

Flow Optimizations

You must enable the Snowflake optimizations in your flow. In Flow View, select More menu > Optimization settings.

NOTE: All general optimizations must be enabled for your flow, as well as the following optimizations, which are specific to Snowflake.

OptimizationDescription
Snowflake > Column pruning from source

When enabled, job execution performance is improved by removing any unused or redundant columns from the source database.

Snowflake > Filter pushdown

When this setting is enabled, the  optimizes job performance on this flow by pushing data filters directly on the source database.

Snowflake > Full pushdownWhen this setting is enabled, all supported pushdown operations, including full transformation and profiling job execution, is pushed down to Snowflake, where possible.

For more information, see Flow Optimization Settings Dialog.

Run Job

To execute a job in Snowflake in the :

Tip: After launching the job, you can monitor job execution through the Job Details page, which includes a link to the corresponding job in the Snowflake console.


Unsupported  for Snowflake Execution

The following transformations and functions are not currently supported for execution in Snowflake. 

NOTE: If your recipe contains any of the following transformations or functions, full job execution in Snowflake is not possible at this time. These transformations are expected to be supported and removed from this list in future releases.

General limitations

For more information on limitations on specific push-downs, see Flow Optimization Settings Dialog.

Unsupported input data types

The following Snowflake data types are not supported for input into :

Unsupported 

None.

Unsupported transformations

The following  functions are not currently supported for execution in Snowflake.

Unsupported functions

The following  functions are not currently supported for execution in BigQuery.

Aggregate functions

KTHLARGEST
KTHLARGESTIF
KTHLARGESTUNIQUE
KTHLARGESTUNIQUEIF
APPROXIMATEMEDIAN
APPROXIMATEPERCENTILE
APPROXIMATEQUARTILE
QUARTILE

For more information, see Aggregate Functions.

Math functions

LCM
NUMVALUE

Partially supported:

NUMFORMAT: Only supported when used for rounding.

For more information, see Math Functions.

Date functions

NETWORKDAYS
NETWORKDAYSINTL
WORKDAY
WORKDAYINTL
KTHLARGESTDATE
KTHLARGESTUNIQUEDATE
KTHLARGESTUNIQUEDATEIF
KTHLARGESTDATEIF
EOMONTH
SERIALNUMBER

String functions

DOUBLEMETAPHONEEQUALS
TRANSLITERATE

For more information, see String Functions.

Type functions

Partially supported:

IFMISSING

NOTE: When the IFMISSING function immediately follows the PREV function in your recipe steps, Snowflake generates an incorrect value. This is a known issue and will be fixed in a future Snowflake release.

Window functions

SESSION

For more information, see Window Functions.

Verify Execution

To verify execution in Snowflake, please do the following:

Steps:

  1. In the left nav bar, click the Jobs link. 
  2. In the Jobs page, select the job that you executed. 
  3. In the Overview tab, the value for Environment under the Execution summary should be: Snowflake.

For more information, see Job Details Page.