Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r094

...

Tip

Tip: Execution on datasets created with custom SQL is supported.

If the requirements and limitations are met, the 

D s webapp
 automatically executes the job in Snowflake.

Requirements

D s ed

General

  • This feature must be enabled by the workspace admin. See below.
  • D s webapp
     must be integrated with Snowflake. See Snowflake Connections.
    • The permission to execute jobs in Snowflake must be enabled. 
  • All sources and outputs must reside in Snowflake.
  • Spark + Snowflake must be selected as running environment. See Run Job Page.
  • Jobs are executed in the virtual warehouse that is specified as part of the Snowflake connection. 

    Info

    NOTE: Job execution requires significantly more resources than ingest or publish jobs on Snowflake. Before you begin using Snowflake, you should verify that your Snowflake virtual warehouse has sufficient resources to handle the expected load. For more information, see Snowflake Connections.

  • In your flow, you must enable all general and Snowflake-specific flow optimizations. When all of these optimizations are enabled, the job can be pushed down to Snowflake for execution. See "Flow Optimizations" below.

Requirements across multiple Snowflake connections

If you are executing a job on Snowflake that utilizes multiple connections, the following requirements must also be met for execution of the job on Snowflake:

  • All Snowflake connections used in the job must utilize to the same Snowflake account.
  • All Snowflake connections used in the job must be backed by the same Snowflake primary role. For more information,

...

...


...

Limitations

Snowflake as a running environment requires that pushdowns be enabled for the workspace and for the specific flow for which the job is executed. If the flow and the workspace are properly configured, the job is automatically executed in Snowflake.

...

  • All datasources and all outputs specified in a job must be located within Snowflake.
  • All recipe steps, including all 

    D s lang
     functions in the recipe, must be translatable to SQL. 

    Info

    NOTE: When attempting to execute a job in Snowflake,

    D s webapp
    executes each recipe in Snowflake, until it reaches a step that cannot be executed there. At that point, data is transferred to EMR, where the remainder of the job is executed.

  • If the schemas have changed for your datasets, pushdown execution on Snowflake is not supported. 

    D s product
     falls back to submitting the job through another running environment.

  • Some transformations and functions are not currently supported for execution in Snowflake. See below.
  • Sampling jobs are not supported for execution in Snowflake.
  • If your recipe includes data quality rules, the job cannot be fully executed in Snowflake.
  • Visual profiling is supported with the following conditions or requirements.  
    • Visual profiles are unloaded to a stage in an S3 bucket.

    • If a stage is named in the connection, it is used. This stage must point to the default S3 bucket in use.
    • If no stage is named, a temporary stage is be created in the PUBLIC schema. The connecting user must have write access to PUBLIC

      Info

      NOTE: Creating a temporary stage requires temporary credentials from AWS. These credentials are valid for 1 hour only. If a job is expected to run longer than one hour, you should define a named stage.

    • For more information, see Snowflake Connections.

...

The following 

D s lang
 functions are not currently supported for execution in Snowflake.

  • Standardize
  • Split on multiple delimiters

Unsupported functions

The following 

D s lang
 functions are not currently supported for execution in BigQuery.

...

  1. In the left nav bar, click the Jobs link. 
  2. In the Jobs Job History page, select the job that you executed. 
  3. In the Overview tab, the value for Environment under the Execution summary should be: Snowflake.

...