This section provides information on how to enable Redshift connectivity and create one or more connections to Redshift sources.
- Amazon Redshift is a hosted data warehouse available through Amazon Web Services. It is frequently used for hosting of datasets used by downstream analytic tools such as Tableau and Qlik. For more information, see https://aws.amazon.com/redshift/.
Before you begin, please verify that your Trifacta® environment meets the following requirements:
NOTE: If you are connecting to any relational source of data, such as Redshift or Oracle, you must add the Trifacta Service to your whitelist for those resources. For more information, see Getting Started with Trifacta Wrangler Pro.
Tip: If the credentials used to connect to S3 do not provide access to Redshift, you can create an independent IAM role to provide access from Redshift to S3. If this separate role is available, the Redshift connection uses it instead. There may be security considerations.
Access to Redshift requires:
- Each user is able to access S3
- S3 is the base storage layer
If the credentials used to connect to S3 do not provide access to Redshift, you can create an independent IAM role to provide access from Redshift to S3. If this separate role is available, the Redshift connection uses it instead.
NOTE: There may be security considerations with using an independent role to govern this capability.
- The IAM role must contain the required S3 permissions. See Required AWS Account Permissions.
- The Redshift cluster should be assigned this IAM role. For more information, see https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html.
- You can publish any specific job once to Redshift through the export window. See Publishing Dialog.
- The Redshift cluster with which you are integrating must be hosted in a public subnet.
- When publishing to Redshift through the Publishing dialog, output must be in Avro or JSON format. This limitation does not apply to direct writing to Redshift.
- Management of nulls:
- Nulls are displayed as expected in the Trifacta application.
- When Redshift jobs are run, the UNLOAD SQL command in Redshift converts all nulls to empty strings. Null values appear as empty strings in generated results, which can be confusing. This is a known issue with Redshift.
You can create Redshift connections through the following methods.
Tip: SSL connections are recommended. Details are below.
Create through application
Any user can create a Redshift connection through the application.
- Login to the application.
- In the menu, click the Connections icon.
- In the Create Connection page, click the Redshift connection card.
Specify the properties for your Redshift database connection:
Property Description Host
Hostname of the Redshift cluster
NOTE: This value must be the full hostname of the cluster, which may include region information.
Port Port number used to access the Redshift cluster. Default is
Connect String Options Please insert any connection options as a string here. See below. Database The Redshift database to which to connect on the cluster Credential Type Options: Basic authentication with optional IAM role ARN: Basic authentication credentials specified in this window are used to connect to the Redshift database. Additional permissions may be governed by any ARN specified in the IAM role used for the account. Use this option if you are planning to specify a database username/password combination as part of the connection. IAM Role: Connection to Redshift is governed by the IAM role associated with the user's account. Username Username with which to connect to the Redshift database Password Password associated with the Redshift username IAM Role ARN for Redshift/S3 connectivity
(Optional) You can specify an IAM role ARN that enables role-based connectivity between Redshift and the S3 bucket that is used as intermediate storage during Redshift bulk COPY/UNLOAD operations. Example:
For more information on the other options, see Create Connection Window.
Enable SSL connections
To enable SSL connections to Redshift, you must enable them first on your Redshift cluster. For more information, see https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-ssl-support.html.
In your connection to Redshift, please add the following string to your Connect String Options:
Save your changes.
The properties that you provide are inserted into the following URL, which connects Trifacta Wrangler Pro to the connection:
Connect string options
The connect string options are optional. If you are passing additional properties and values to complete the connection, the connect string options must be structured in the following manner:
<prop>: the name of the property
<val>: the value for the property
;: any set of connect string options must begin and end with a semi-colon.
;: all additional property names must be prefixed with a semi-colon.
=: property names and values must be separated with an equal sign (
Access through AWS key-secret
The following example connection URL uses an AWS key/secret combination (IAM user) to access Redshift:
<redshift_clustername>: the name of the redshift cluster
<region_name>: region identifier where the cluster is located
<port_number>: port number to use to access the cluster
<database_name>: name of the Redshift database to which to connect
<access_key_value>: identifier for the AWS key
<secret_key_value>: identifier for the AWS secret
<database_user_name>: user identifier for connecting to the database
Access through IAM role and temporary credentials
The following example connection URL uses an AWS/Key secret combination using temporary credentials:
- See previous.
<session_token>: the AWS session token retrieved when using temporary credentials. The session token is requested by Trifacta Wrangler Pro when using AWS temporary credentials.
This connection uses the following driver:
- Driver version:
- Driver documentation: https://docs.aws.amazon.com/redshift/latest/mgmt/configure-jdbc-connection.html
For more information, see https://docs.aws.amazon.com/redshift/latest/mgmt/troubleshooting-connections.html.
For more information, see Redshift Browser.
For more information about interacting with data on Redshift, see Using Redshift.
Import a dataset from Redshift. Add it to a flow, and specify a publishing action. Run a job.
NOTE: When publishing to Redshift through the Publishing dialog, output must be in Avro or JSON format. This limitation does not apply to direct writing to Redshift.
This page has no comments.