This section provides information on how to enable Redshift connectivity and create one or more connections to Redshift sources.
NOTE: If you have upgraded to Release 3.1 or later from a version where Redshift connectivity was enabled, your connection must be recreated. All publication information from the earlier version is lost. |
NOTE: In the Admin Settings page are some deprecated parameters pertaining to Redshift. Please ignore these parameters and their settings. They do not apply to this release. |
Before you begin, please verify that your meets the following requirements:
S3 base storage layer: Redshift access requires use of S3 as the base storage layer, which must be enabled. See Set Base Storage Layer.
NOTE: After you define your storage deployment options, you cannot change them for your deployment. For more information, see Storage Deployment Options. |
When Redshift jobs are run, the UNLOAD SQL command in Redshift converts all nulls to empty strings. Null values appear as empty strings in generated results, which can be confusing. This is a known issue with Redshift.
You can create Redshift connections through either of the following methods:
Tip: SSL connections are recommended. Details are below. |
Through the command line interface
NOTE: You cannot create Redshift connections through the APIs. |
Any user can create a Redshift connection through the application.
Steps:
Specify the properties for your Redshift database connection. The following parameters are specific to Redshift connections:
Property | Description | ||
---|---|---|---|
IAM Role ARN for Redshift-S3 Connectivity | (Optional) You can specify an IAM role ARN that enables role-based connectivity between Redshift and the S3 bucket that is used as intermediate storage during Redshift bulk COPY/UNLOAD operations. Example:
For more information, see Configure for EC2 Role-Based Authentication. |
For more information, see Create Connection Window.
Click Save.
To enable SSL connections to Redshift, you must enable them first on your Redshift cluster. For more information, see https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-ssl-support.html.
In your connection to Redshift, please add the following string to your Connect String Options:
;ssl=true |
Save your changes.
NOTE: In Release 5.0, you cannot create a Redshift connection through the CLI. Please create Redshift connections through the application. This known issue will be addressed in a future release. |
The CLI tools are stored in the following directory:
/opt/trifacta/bin/ |
Example command (all one command):
./trifacta_cli.py create_connection --user_name <platform_admin_username> --password <platform_admin_password> --conn_type amazon_redshift --conn_name aRedshiftConnection --conn_description "This is my Redshift connection." --conn_host example.com --conn_port 10000 --conn_credential_type aws --conn_credential_location ~/.trifacta/config_conn.json --conn_params_location ~/.trifacta/p.json --conn_is_global --cli_output_path ./conn_create.out |
Parameter | Description | |
---|---|---|
create_connection | CLI action type. Please leave this value. | |
--user_name | Username of the For this connection type, it must be an admin user account. | |
--password | Password of the account. | |
--conn_type | The type of connection. Set this value to amazon_redshift . | |
--conn_name | The internal name of this connection.
| |
--conn_description | A user-friendly description for this connection appears in the application. | |
--conn_host | Host of the Redshift instance. | |
--conn_port | Port number of the Redshift instance. | |
--conn_credential_type | The type of credentials to use. Set this value to aws for Redshift. | |
--conn_credential_location | Path to the file containing the credentials used to access Redshift. See below. | |
--conn_params_location | Path to the file containing the parameters to pass to Redshift during interactions. See below. | |
--conn_is_global | This flag makes the connection available to all
| |
--conn_skip_test | By default, any connection is tested as part of a create or edit action. Add this flag to skip testing the connection. This flag requires no value.
| |
--cli_output_path | The path to the file where results of this command are written.
|
Example credentials file:
A credentials file for AWS is structured in the following manner. This file must be accessible within the . Please add a reference to this file in your command.
{ "username": "<your_user>", "password": "<your_password>" "iamRoleArn": "<your_IAM_role_ARN>" } |
NOTE: |
Example params file:
A parameter file containing the following information must be accessible within the . Please add a reference to this file in your command.
Example:
{ "defaultDatabase":"<your_database>", "extraLoadParams": "BLANKSASNULL EMPTYASNULL TRIMBLANKS TRUNCATECOLUMNS" } |
The first parameter defines the default database.
The second parameter is used when you publish results to Redshift. For more information on these values, see http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html.
For more information on the , see CLI for Connections.
NOTE: In Release 5.0, you cannot create a Redshift connection through the API. Please create Redshift connections through the application. This known issue will be addressed in a future release. |
For more information, see API Connections Create v3.
Testing
Import a dataset from Redshift. Add it to a flow, and specify a publishing action. Run a job.
NOTE: To Redshift, you can publish job results in JSON or Avro format. |
After you have run your job, you can publish the results to Redshift through the Export Results Window. See Export Results Window.
You can also publish results through the command line interface. See CLI for Jobs.