Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

  • This feature must be enabled by the workspace admin. See below.
  • D s webapp
     must be integrated with Snowflake. See Snowflake Connections.
    • The permission to execute jobs in Snowflake must be enabled. 
  • All sources and outputs must reside in Snowflake.
  • Permissions are required to run the Snowflake to file jobs.
  • Spark + Snowflake must be selected as running environment. See Run Job Page.
  • Jobs are executed in the virtual warehouse that is specified as part of the Snowflake connection. 

    Info

    NOTE: Job execution requires significantly more resources than ingest or publish jobs on Snowflake. Before you begin using Snowflake, you should verify that your Snowflake virtual warehouse has sufficient resources to handle the expected load. For more information, see Snowflake Connections.

  • In your flow, you must enable all general and Snowflake-specific flow optimizations. When all of these optimizations are enabled, the job can be pushed down to Snowflake for execution. See "Flow Optimizations" below.

...

The following setting must be enabled in the workspace. Select User menu > Admin console > Workspace settingsSettings.

OptimizationDescription
Logical and physical optimization of jobs

When enabled, the

D s webapp
attempts to optimize job execution through logical optimizations of your recipe and physical optimizations of your recipes interactions with data.

...

OptimizationDescription
Snowflake > Column pruning from source

When enabled, job execution performance is improved by removing any unused or redundant columns from the source database.

Snowflake > Filter pushdown

When this setting is enabled, the 

D s webapp
 optimizes job performance on this flow by pushing data filters directly on the source database.

Snowflake > Full pushdown

When this setting is enabled, all supported pushdown operations, including full transformation and profiling job execution, is pushed down to

D s conntype
typesnowflake
, where possible.

Full execution for S3

If requirements are met for data sourced from S3, you can enable execution of your S3 datasources in 

D s conntype
typesnowflake
.

Info

NOTE: Snowflake pushdown is not supported for external S3 connections.

Source to FilesWhen this setting is enabled, Snowflake table that meet all pushdown requirements can be executed through Snowflake and  published to S3.

For more information, see Flow Optimization Settings Dialog.

...

Tip

Tip: After launching the job, you can monitor job execution through the Job Details page, which includes a link to the corresponding job in the Snowflake console.

S3 File Support

D s ed
editionsawsent,awspro,awspr

In addition to

D s conntype
typesnowflake
 sources, you can execute jobs in
D s conntype
typesnowflake
on source files from  S3. 

Tip

Tip: The

D s conntype
typesnowflake
running environment also supports hybrid sources, so you can use as sources S3 files and
D s conntype
typesnowflake
tables in the same flow.

Requirements

...

S3 or 

D s tfs
is supported as the default storage layer.

...

D s conntype
typesnowflake

...

D s webapp

...

In the Run Job page, the Spark +

D s conntype
typesnowflake
 running environment must be selected.

Tip

Tip: If this option is not available, one or more requirements for S3 file execution on

D s conntype
typesnowflake
have not been met.

...

Execution requirements

Info

NOTE: For execution of S3 jobs in

D s conntype
typesnowflake
, AWS credentials are passed in encrypted format as part of the SQL that is executed within
D s conntype
typesnowflake

...

.

...

Supported file formats from  S3

...

CSV: Files that fail to meet the following requirements may cause job failures when executed in

D s conntype
typesnowflake
, even though they can be imported into
D s product
. Requirements:

  • For job execution of CSV files in

    D s conntype
    typesnowflake
    , source CSV files must be well-formatted.

  • Newlines must be inserted. 

  • Fields must be demarcated with quotes and commas.

    Info

    NOTE: Escaped quotes in field values must be represented as double quotes ( ""). Escaped quotes with a backslash is not supported.

  • Each row must have the same number of columns.

...

TSV

JSON (newline-delimited)

Info

NOTE:

D s conntype
typesnowflake
only supports UTF-8 encoding for JSON files.

...

gzip and bz2 compressed formats are supported. 

Info

NOTE: Snappy compression is not supported for S3 execution on

D s conntype
typesnowflake
.

Supported file encodings:

  • UTF-8
  • ISO-8859-1

Supported delimiters:

  • Comma
  • Tab
  • Pipe

Supported quote characters:

...



Uploaded File Support

When a file is uploaded from your desktop, ingested, and stored in a storage layer that is supported for file pushdown, jobs that reference datasets created from that file are eligible for execution in

D s conntype
typesnowflake
. For example, if your base storage layer is S3, then files uploaded from your desktop could be used for jobs that execute like S3 files in
D s conntype
typesnowflake
. The requirements and limitations listed in the previous section apply.

...