- Snowflake is an S3-based data warehouse service hosted in the cloud. Auto-scaling, automatic failover, and other features simplify the deployment and management of your enterprise's data warehouse. For more information, see https://www.snowflake.com.
Trifacta Self-Managed Enterprise Edition
|Read||Not supported||Supported||Not supported|
|Write||Not supported||Supported||Not supported|
- S3 base storage layer: Snowflake access requires installation of Trifacta software in the AWS infrastructure and use of S3 as the base storage layer, which must be enabled. See Set Base Storage Layer.
Integration: Your Trifacta instance is connected to an EMR cluster. See Configure for EMR.
Deployment: Trifacta platform is deployed in EC2.
Integration with Snowflake requires deployment of the Trifacta platform within a customer-managed AWS infrastructure. For more information, see Snowflake Access.
- PUBLIC schema: If you do not create an external staging database:
PUBLICschema is required in your default database.
- If you do not provide a stage database, then a temporary stage is created for you under the
PUBLICschema in the default database.
- S3 bucket: The user-created stage must point to the same S3 bucket as the default bucket in use by Trifacta Self-Managed Enterprise Edition.
- Same region: The Snowflake cluster must be in the same region as the default S3 bucket.
- IAM role requirements: If you are accessing AWS and Snowflake using IAM roles, please verify that the appropriate permissions have been assigned to the role to access Snowflake and its backing S3 buckets. For more information, see Required AWS Account Permissions.
- Staging database: Snowflake supports the use of a stage for reading and writing data to S3 during job executions.
NOTE: If a stage is not deployed, then the user must have write permissions to the default database, which is used instead for staging your data in Snowflake. These permissions must be included in the AWS credentials applied to the user account.
For more information, see Snowflake Access.
Prerequisites for OAuth 2.0
If you are connecting to your Snowflake deployment using OAuth 2.0 authentication, additional configuration is required:
- OAuth 2.0 must be enabled and configured for use in the product. For more information, see Enable OAuth 2.0 Authentication.
- OAuth 2.0 requirements:
- Create a security integration in your Snowflake deployment.
- Create an OAuth 2.0 client in the Trifacta application that connects using the security integration.
- For more information, see OAuth 2.0 for Snowflake.
- You cannot perform ad-hoc publication to Snowflake.
- SSO connections are not supported.
- To ingest data from a Snowflake table, one of the following must be enabled:
- A named stage must be created for the table. For more information, see the Snowflake documentation.
- Snowflake must be permitted to create a temporary stage, which requires:
- Write permissions on the table's database, and
- A schema named PUBLIC must exist and be accessible.
- No schema validation is performed as part of writing results to Snowflake.
- Credentials and permissions are not validated when you are modifying the destination for a publishing job.
For Snowflake, no validation is performed to determine if the target is a view and is therefore not a supported target.
You can create Snowflake connections through the following methods.
Create through application
Any user can create a Snowflake connection through the application.
- Login to the application.
- In the left nav bar, click the Connections icon.
- In the Create Connection page, click the Snowflake connection card.
Specify the properties for your Snowflake database connection. The following parameters are specific to Snowflake connections:
NOTE: In Snowflake connections, property values are case-sensitive. Snowflake-related locations are typically specified in capital letters.
Snowflake account to use. Suppose your hostname is the following:
Your account name is the following:
NOTE: Your full account name might include additional segments that identify the region and cloud platform where your account is hosted.
The name of the warehouse to use when connected. This value can be an empty string.
If specified, the warehouse should be an existing warehouse for which the default role has privileges.
If you have deployed a Snowflake stage for managing file conversion to tables, you can enter its name here. A stage is a database object that points to an external location on S3. It must be an external stage containing access credentials.
If a stage is used, then this value is typically the schema and the name of the stage. Example value:
If a stage is not specified, a temporary stage is created using the current user's AWS credentials.
NOTE: Without a defined stage, you must have write permissions to the database from which you import. This database is used to create the temporary stage.
For more information on stages, see https://docs.snowflake.net/manuals/sql-reference/sql/create-stage.html.
Select the type of credentials to provide with the connection:
|Database for Stage|
(optional) If you are using a Snowflake stage, you can specify a database other than the default one to host the stage.
NOTE: If you are creating a read-only connection to Snowflake, this field is required. The accessing user must have write permission to the specified database.
If no value is specified, then your stage must be in the default database.
For more information, see Create Connection Window.
The properties that you provide are inserted into the following URL, which connects Trifacta Self-Managed Enterprise Edition to the connection:
<database>= name of the default database to which to connect. This value can be empty.
Connect string options
The connect string options are optional. If you are passing additional properties and values to complete the connection, the connect string options must be structured in the following manner:
<prop>: the name of the property
<val>: the value for the property
&: any set of connect string options must begin with an ampersand (
=: property names and values must be separated with an equal sign (
Disable SSL connections
By default, connections to Snowflake use SSL. To disable, please add the following string to your Connect String Options:
Connect through proxy
If you require connection to Snowflake through a proxy server, additional Connect String Options are required. For more information, see https://docs.snowflake.net/manuals/user-guide/jdbc-configure.html#specifying-a-proxy-server-in-the-jdbc-connection-string.
This connection uses the following driver:
- Driver version:
- Driver documentation: https://docs.snowflake.com/en/user-guide/jdbc.html
Create via API
For more information, see https://api.trifacta.com/ee/8.7/index.html#operation/createConnectionAPI: API Reference
|Null values in some columns for all rows|
When there are spaces/special characters in columns names, null values can be inserted for all rows in the column. The workaround is to remove any special characters and spaces from column names.
Import a dataset from Snowflake. Add it to a flow, and specify a publishing action back to Snowflake. Run a job.For more information, see Verify Operations.
Using Snowflake Connections
Uses of Snowflake
The Trifacta platform can use Snowflake for the following tasks:
- Create datasets by reading from Snowflake tables.
Write to Snowflake tables with your job results.
Before you begin using Snowflake
- Enable S3 Sources: Snowflake integration requires the following:
- Installation of the product on a customer-managed AWS infrastructure.
- S3 is set to the base storage layer.
- For more information, see Snowflake Access.
Read Access: Your Snowflake administrator must configure read permissions. Your administrator should provide a database for upload to your Snowflake data warehouse.
Read-only Access: If you are creating a read-only connection to Snowflake, you must provide a database for staging. The accessing user must have write permission to the specified database.
Write Access: You can write and publish jobs results to Snowflake.
SSL is the default connection method.
Storing data in Snowflake
Your Snowflake administrator should provide database access for storing datasets. Users should know where shared data is located and where personal data can be saved without interfering with or confusing other users.
NOTE: Trifacta Self-Managed Enterprise Edition does not modify source data in Snowflake. Datasets sourced from Snowflake are read without modification from their source locations.
Reading from Snowflake
You can create a Trifacta dataset from a table stored in Snowflake.
NOTE: The Snowflake cluster must be in the same region as the default S3 bucket.
Writing to Snowflake
You can write back data to Snowflake using one of the following methods:
Job results can be written directly to Snowflake as part of the normal job execution. Create a new publishing action to write to Snowflake. See Run Job Page.
- For more information on how data is converted to Snowflake, see Snowflake Data Type Conversions.
Data Validation issues:
- No validation is performed for the connection and any required permissions during job execution. So, you can be permitted to launch your job even if you do not have sufficient connectivity or permissions to access the data. The corresponding publish job fails at runtime.
- Prior to publication, no validation is performed on whether a target is a table or a view, so the job that was launched fails at runtime.
This page has no comments.