This section provides information on how to enable Redshift connectivity and create one or more connections to Redshift sources.
- Amazon Redshift is a hosted data warehouse available through Amazon Web Services. It is frequently used for hosting of datasets used by downstream analytic tools such as Tableau and Qlik. For more information, see https://aws.amazon.com/redshift/.
NOTE: In the Admin Settings page are some deprecated parameters pertaining to Redshift. Please ignore these parameters and their settings. They do not apply to this release.
Before you begin, please verify that your Trifacta® environment meets the following requirements:
S3 base storage layer: Redshift access requires use of S3 as the base storage layer, which must be enabled. See Set Base Storage Layer.
NOTE: After you define your storage deployment options, you cannot change them for your deployment. For more information, see Storage Deployment Options.
- Redshift version: For more information on supported versions, see Connection Types.
- Same region: The Redshift cluster must be in the same region as the default S3 bucket.
- Integration: Your Trifacta instance is connected to a running environment supported by your product edition.
- Hadoop deployment: Hadoop is deployed either on-premises or in EC2.
- Per-user mode: If per-user mode is enabled for S3, the user must have permission to access the manifest file (if used) and the LIST permission for the Amazon S3 bucket and the GET permission for the Amazon S3 objects that are being loaded for the bucket to which job results are written for each user.
- When publishing to Redshift, output must be in Avro or JSON format.
- You can publish any specific job once to Redshift through the export window. See Export Results Window.
- To republish, you must publish it again through the command line interface. See CLI for Jobs.
- Management of nulls:
- Nulls are displayed as expected in the Trifacta application.
When Redshift jobs are run, the UNLOAD SQL command in Redshift converts all nulls to empty strings. Null values appear as empty strings in generated results, which can be confusing. This is a known issue with Redshift.
You can create Redshift connections through either of the following methods:
Tip: SSL connections are recommended. Details are below.
- Create a connection through the application
Through the command line interface
Create through application
Any user can create a Redshift connection through the application.
- Login to the application.
- To create an import-only connection:
- In the menu, click Datasets.
- Click Import Data.
- In the left nav bar, click the New icon.
- To create an import-and-publish connection:
- In the menu, click Settings menu > Connections.
- In the Create Connection page, click the Redshift connection card.
Specify the properties for your Redshift database connection. The following parameters are specific to Redshift connections:
Property Description IAM Role ARN for Redshift-S3 Connectivity
(Optional) You can specify an IAM role ARN that enables role-based connectivity between Redshift and the S3 bucket that is used as intermediate storage during Redshift bulk COPY/UNLOAD operations. Example:
NOTE: If a Redshift connection has an invalid iamRoleArn, you can browse, import datasets, and open the data in the Transformer page. However, any jobs executed using this connection fail. If the iamRoleArn is invalid, the only samples that you can generate are Quick Random samples; other sampling jobs fail.
For more information, see Configure for EC2 Role-Based Authentication.
For more information, see Create Connection Window.
Enable SSL connections
To enable SSL connections to Redshift, you must enable them first on your Redshift cluster. For more information, see https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-ssl-support.html.
In your Trifacta Wrangler Enterprise connection to Redshift, please add the following string to your Connect String Options:
Save your changes.
Create through CLI
The CLI tools are stored in the following directory:
Example command (all one command):
|CLI action type. Please leave this value.|
Username of the Trifacta administrator account to use. Please specify the full username.
For this connection type, it must be an admin user account.
|Password of the account.|
|The type of connection. Set this value to |
The internal name of this connection.
NOTE: This value must be unique within all of your connections.
|A user-friendly description for this connection appears in the application.|
|Host of the Redshift instance.|
Port number of the Redshift instance.
|The type of credentials to use. Set this value to |
|Path to the file containing the credentials used to access Redshift. See below.|
|Path to the file containing the parameters to pass to Redshift during interactions. See below.|
This flag makes the connection available to all Trifacta users. This flag requires no value.
NOTE: After a connection has been made global, it cannot be made private. You must delete the connection through the CLI and then recreate it. When a Redshift connection is deleted, all publication information associated with it is dropped.
By default, any connection is tested as part of a create or edit action. Add this flag to skip testing the connection. This flag requires no value.
Tip: After creation, you can test the connection and modify it through the application. See Flows Page.
The path to the file where results of this command are written.
NOTE: The user issuing the command must also have execute permissions on all parent folders in the specified
Example credentials file:
A credentials file for AWS is structured in the following manner. This file must be accessible within the Trifacta deployment directory. Please add a reference to this file in your command.
iamRoleArn is optional. For more information, see Configure for EC2 Role-Based Authentication.
Example params file:
A parameter file containing the following information must be accessible within the Trifacta deployment directory. Please add a reference to this file in your command.
The first parameter defines the default database.
The second parameter is used when you publish results to Redshift. For more information on these values, see http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html.
For more information on the Trifacta command line interface, see CLI for Connections.
Create via API
For more information, see API Connections Create v3.
Import a dataset from Redshift. Add it to a flow, and specify a publishing action. Run a job.
NOTE: To Redshift, you can publish job results in JSON or Avro format.
After you have run your job, you can publish the results to Redshift through the Export Results Window. See Export Results Window.
- For more information, see Verify Operations.
You can also publish results through the command line interface. See CLI for Jobs.
This page has no comments.