This section describes how to integrate the  with Snowflake databases.

Limitations

NOTE: This integration is supported only for deployments of in customer-managed AWS infrastructures. These deployments must use S3 as the base storage layer. For more information, see Supported Deployment Scenarios for AWS.

Pre-requisites

Enable

When relational connections are enabled, this connection type is automatically available. For more information, see Enable Relational Connections.

Configure

To create a Snowflake connection, you must enable the following feature. The job manifest feature enables the creation of a manifest file to track the set of temporary files written to S3 before publication to Snowflake.

NOTE: To guarantee consistency you can enable consistent view on the EMR cluster. This step must be done when the cluster is created. See Configure for EMR.


Steps:

  1. Locate the following parameter and set it to true:

    "feature.enableJobOutputManifest": true,
  2. Save your changes and restart the platform.

Create Stage

In Snowflake terminology, a stage is a database object that points to an external location on S3. It must be an external stage containing access credentials.

For more information on stages, see https://docs.snowflake.net/manuals/sql-reference/sql/create-stage.html.

In the , the stage location is specified as part of creating the Snowflake connection.

Create Snowflake Connection

For more information, see Create Snowflake Connections.

Testing

Steps:

  1. After you create your connection, load a small dataset based on a table in the connected Snowflake database.

    NOTE: For Snowflake connections, you must have write access to the database from which you are importing.


    See Import Data Page

  2. Perform a few simple transformations to the data. Run the job. See Transformer Page.
  3. Verify the results.

For more information, see Verify Operations.