Page tree

Trifacta SaaS



Contents:


   

Contents:


This section describes how to create a connection to your Snowflake datawarehouse. 
  • Snowflake is an S3-based data warehouse service hosted in the cloud. Auto-scaling, automatic failover, and other features simplify the deployment and management of your enterprise's data warehouse. For more information, see https://www.snowflake.com


  • Read: Supported
  • Write: Supported

Prerequisites

If you are connecting Trifacta to any relational source of data, such as Redshift or Oracle, you must add the Trifacta Service to your whitelist for those resources.

For more information, see Getting Started with Trifacta.


  • PUBLIC schema: If you do not create an external staging database:
    • A PUBLIC schema is required in your default database.
    • If you do not provide a stage database, then a temporary stage is created for you under the PUBLIC schema in the default database.
  • S3 bucket: The user-created stage must point to the same S3 bucket as the default bucket in use by Trifacta.
  • Same region:  The Snowflake cluster must be in the same region as the default S3 bucket.
  • IAM role requirements: If you are accessing AWS and Snowflake using IAM roles, please verify that the appropriate permissions have been assigned to the role to access Snowflake and its backing S3 buckets. For more information, see Required AWS Account Permissions.
  • Staging database: Snowflake supports the use of a stage for reading and writing data to S3 during job executions.

NOTE: If a stage is not deployed, then the user must have write permissions to the default database, which is used instead for staging your data in Snowflake. These permissions must be included in the AWS credentials applied to the user account.


Prerequisites for OAuth 2.0

If you are connecting to your Snowflake deployment using OAuth 2.0 authentication, additional configuration is required:

  • OAuth 2.0 must be enabled and configured for use in the product. For more information, see Enable OAuth 2.0 Authentication.
  • OAuth 2.0 requirements:
    • Create a security integration in your Snowflake deployment.
    • Create an OAuth 2.0 client in the Trifacta application that connects using the security integration.
    • For more information, see OAuth 2.0 for Snowflake.

Limitations

  • You cannot perform ad-hoc publication to Snowflake.
  • SSO connections are not supported.
  • The Snowflake cluster with which you are integrating must be hosted in a public subnet.
  • To ingest data from a Snowflake table, one of the following must be enabled:
    • A named stage must be created for the table. For more information, see the Snowflake documentation.
    • Snowflake must be permitted to create a temporary stage, which requires:
      • Write permissions on the table's database, and
      • A schema named PUBLIC must exist and be accessible.
  • No schema validation is performed as part of writing results to Snowflake.
  • Credentials and permissions are not validated when you are modifying the destination for a publishing job. 
  • For Snowflake, no validation is performed to determine if the target is a view and is therefore not a supported target.

Create Connection

You can create Snowflake connections through the following methods.

Create through application

Any user can create a Snowflake connection through the application.

Steps:

  1. Login to the application.
  2. In the left nav bar, click the Connections icon.
  3. In the Create Connection page, click the Snowflake connection card.
  4. Specify the properties for your Snowflake database connection. The following parameters are specific to Snowflake connections:

    NOTE: In Snowflake connections, property values are case-sensitive. Snowflake-related locations are typically specified in capital letters.

PropertyDescription
Account Name

Snowflake account to use. Suppose your hostname is the following:

mycompany.snowflakecomputing.com

Your account name is the following:

mycompany

NOTE: Your full account name might include additional segments that identify the region and cloud platform where your account is hosted.

Warehouse

The name of the warehouse to use when connected. This value can be an empty string.

If specified, the warehouse should be an existing warehouse for which the default role has privileges.


Stage

If you have deployed a Snowflake stage for managing file conversion to tables, you can enter its name here. A stage is a database object that points to an external location on S3. It must be an external stage containing access credentials.

If a stage is used, then this value is typically the schema and the name of the stage. Example value:

MY_SCHEMA.MY_STAGE	

If a stage is not specified, a temporary stage is created using the current user's AWS credentials.

NOTE: Without a defined stage, you must have write permissions to the database from which you import. This database is used to create the temporary stage.


For more information on stages, see https://docs.snowflake.net/manuals/sql-reference/sql/create-stage.html.

Credential Type

Select the type of credentials to provide with the connection:

  • Basic - username and password are used by the connection to authenticate to Snowflake.
  • OAuth 2.0 - use OAuth 2.0 client connect to Snowflake. The client must already be defined in the Trifacta application and then selected in the connection configuration.

    NOTE: After you have specified the connection to use OAuth 2.0, click Authenticate to validate the connection with the target datastore. If you have modified the connection, click Re-authenticate to validate the new connection definition. You must re-authenticate if you receive an expired tokens message. For more information, see Enable OAuth 2.0 Authentication.

    For more information, see OAuth 2.0 for Snowflake.

Database for Stage

(optional) If you are using a Snowflake stage, you can specify a database other than the default one to host the stage.

NOTE: If you are creating a read-only connection to Snowflake, this field is required. The accessing user must have write permission to the specified database.

If no value is specified, then your stage must be in the default database.

For more information, see Create Connection Window.

Connection URL

The properties that you provide are inserted into the following URL, which connects  Trifacta to the connection:

jdbc:snowflake://<account_name>.snowflakecomputing.com/?db=<database>&warehouse=<warehouse><connect-string-options>

where:

  • <database> = name of the default database to which to connect. This value can be empty.

Connect string options

The connect string options are optional. If you are passing additional properties and values to complete the connection, the connect string options must be structured in the following manner:

&<prop1>=<val1>&<prop2>=<val2>...

where:

  • <prop> : the name of the property
  • <val> : the value for the property

Delimiters:

  • & : any set of connect string options must begin with an ampersand (&).
  • = : property names and values must be separated with an equal sign (=).

Disable SSL connections

By default, connections to Snowflake use SSL. To disable, please add the following string to your Connect String Options:

;ssl=false


Connect through proxy

If you require connection to Snowflake through a proxy server, additional Connect String Options are required. For more information, see https://docs.snowflake.net/manuals/user-guide/jdbc-configure.html#specifying-a-proxy-server-in-the-jdbc-connection-string.

Driver Information

This connection uses the following driver:

Create via API

For more information, see  Trifacta API Reference docs: Enterprise | Professional | PremiumAPI: API Reference

  • Type: snowflake
  • vendor: snowflake

Troubleshooting

Error MessageDescription
Null values in some columns for all rows

When there are spaces/special characters in columns names, null values can be inserted for all rows in the column. The workaround is to remove any special characters and spaces from column names.

Testing

Import a dataset from Snowflake. Add it to a flow, and specify a publishing action back to Snowflake. Run a job.

Using Snowflake Connections

Uses of Snowflake

Trifacta can use Snowflake for the following tasks:

  1. Create datasets by reading from Snowflake tables.
  2. Write to Snowflake tables with your job results.

Before you begin using Snowflake

  • Read Access: Your Snowflake administrator must configure read permissions. Your administrator should provide a database for upload to your Snowflake data warehouse.

    • Read-only Access: If you are creating a read-only connection to Snowflake, you must provide a database for staging. The accessing user must have write permission to the specified database. 

  • Write Access: You can write and publish jobs results to Snowflake. 

Secure access

SSL is the default connection method.

Storing data in Snowflake

Your Snowflake administrator should provide database access for storing datasets. Users should know where shared data is located and where personal data can be saved without interfering with or confusing other users. 

NOTE: Trifacta does not modify source data in Snowflake. Datasets sourced from Snowflake are read without modification from their source locations.

Reading from Snowflake

You can create a Trifacta dataset from a table stored in Snowflake.

NOTE: The Snowflake cluster must be in the same region as the default S3 bucket.

Writing to Snowflake

You can write back data to Snowflake using one of the following methods:

  • Job results can be written directly to Snowflake as part of the normal job execution. Create a new publishing action to write to Snowflake. See Run Job Page.

  • For more information on how data is converted to Snowflake, see Snowflake Data Type Conversions.


Data Validation issues:

  • No validation is performed for the connection and any required permissions during job execution. So, you can be permitted to launch your job even if you do not have sufficient connectivity or permissions to access the data. The corresponding publish job fails at runtime.
  • Prior to publication, no validation is performed on whether a target is a table or a view, so the job that was launched fails at runtime.


This page has no comments.