Skip to main content

Amazon Redshift Connections

This section provides information on how to enable Amazon Redshift connectivity and create one or more connections to Amazon Redshift sources.

  • Amazon Redshift is a hosted data warehouse available through Amazon Web Services. It is frequently used for hosting of datasets used by downstream analytic tools such as Tableau and Qlik. For more information, see https://aws.amazon.com/redshift/.

  • When exporting results, you can choose to write to a Redshift database. See Publishing Dialog.

Supported Environments:

Note

S3 must be set as the base storage layer. See Set Base Storage Layer.

Operation

Designer Cloud Powered by Trifacta platform

Amazon

Microsoft Azure

Read

Not supported

Supported

Not supported

Write

Not supported

Supported

Not supported

Prerequisites

Before you begin, please verify that your Alteryx environment meets the following requirements:

Note

If you are connecting to any relational source of data, such as Amazon Redshift or Oracle Database, you must add the Alteryx Service to your whitelist for those resources.

Tip

If the credentials used to connect to S3 do not provide access to Redshift, you can create an independent IAM role to provide access from Amazon Redshift to S3. If this separate role is available, the Amazon Redshift connection uses it instead. There may be security considerations.

  1. S3 base storage layer: Amazon Redshift access requires use of S3 as the base storage layer, which must be enabled. See Set Base Storage Layer.

  2. Same region: The Amazon Redshift cluster must be in the same region as the default S3 bucket.

  3. Integration:Your Alteryx instance is connected to a running environment supported by your product edition.

  4. Deployment: The Designer Cloud Powered by Trifacta platformis deployed either on-premises or in EC2.

Permissions

Access to Amazon Redshift requires:

  • Each user is able to access S3

  • S3 is the base storage layer

If the credentials used to connect to S3 do not provide access to Amazon Redshift, you can create an independent IAM role to provide access from Amazon Redshift to S3. If this separate role is available, the Amazon Redshift connection uses it instead.

Note

There may be security considerations with using an independent role to govern this capability.

Steps:

  1. The IAM role must contain the required S3 permissions. See Required AWS Account Permissions.

  2. The Amazon Redshift cluster should be assigned this IAM role. For more information, see https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html.

Limitations

  • You can publish any specific job once to Amazon Redshift through the export window. See Publishing Dialog.

  • When publishing to Redshift through the Publishing dialog, output must be in Avro or JSON format. This limitation does not apply to direct writing to Amazon Redshift.

  • Management of nulls:

    • Nulls are displayed as expected in the Trifacta Application.

    • When Amazon Redshift jobs are run, the UNLOAD SQL command in Redshift converts all nulls to empty strings. Null values appear as empty strings in generated results, which can be confusing. This is a known issue with Amazon Redshift.

  • No schema validation is performed as part of writing results to Amazon Redshift.

  • Credentials and permissions are not validated when you are modifying the destination for a publishing job.

  • For Amazon Redshift, no validation is performed to determine if the target is a view and is therefore not a supported target.

Create Connection

You can create Amazon Redshift connections through the following methods.

Tip

SSL connections are recommended. Details are below.

Create through Designer Cloud application

Any user can create a Redshift connection through the application.

Steps:

  1. Login to the application.

  2. In the menu, click the Connections icon.

  3. In the Create Connection page, click the Amazon Redshift connection card.

  4. Specify the properties for your Amazon Redshift database connection:

    Property

    Description

    Host

    Hostname of the Amazon Redshift cluster

    Note

    This value must be the full hostname of the cluster, which may include region information.

    Port

    Port number used to access the Amazon Redshift cluster. Default is 5439.

    Connect String Options

    Please insert any connection options as a string here. See below.

    Database

    The Amazon Redshift database to which to connect on the cluster

    Credential Type

    Options: Basic authentication with optional IAM role ARN: Basic authentication credentials specified in this window are used to connect to the Amazon Redshift database. Additional permissions may be governed by any ARN specified in the IAM role used for the account. Use this option if you are planning to specify a database username/password combination as part of the connection. IAM Role: Connection to Amazon Redshift is governed by the IAM role associated with the user's account.

    Username

    Username with which to connect to the Amazon Redshift database

    Password

    Password associated with the Amazon Redshift username

    IAM Role ARN for Redshift/S3 connectivity

    (Optional) You can specify an IAM role ARN that enables role-based connectivity between Amazon Redshift and the S3 bucket that is used as intermediate storage during Amazon Redshift bulk COPY/UNLOAD operations. Example:

    arn:aws:iam::1234567890:role/MyRedshiftRole

    For more information, seeConfigure for EC2 Role-Based Authentication.

    For more information on the other options, see Create Connection Window.

  5. Click Save.

Enable SSL connections

To enable SSL connections to Amazon Redshift, you must enable them first on your Amazon Redshift cluster. For more information, see https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-ssl-support.html.

In your connection to Amazon Redshift, please add the following string to your Connect String Options:

;ssl=true

Save your changes.

Connection URL

The properties that you provide are inserted into the following URL, which connects Designer Cloud Powered by Trifacta platform to the connection:

jdbc:redshift://<host>:<port>/<database><connect-string-options>
Connect string options

The connect string options are optional. If you are passing additional properties and values to complete the connection, the connect string options must be structured in the following manner:

;<prop1>=<val1>&<prop2>=<val2>;...

where:

  • <prop> : the name of the property

  • <val> : the value for the property

Delimiters:

  • ; : any set of connect string options must begin and end with a semi-colon.

  • ; : all additional property names must be prefixed with a semi-colon.

  • = : property names and values must be separated with an equal sign (=).

Example

Access through AWS key-secret

The following example connection URL uses an AWS key/secret combination (IAM user) to access Amazon Redshift:

jdbc:redshift:iam://<redshift_clustername:region_name>:<port_number>/<database_name>?AccessKeyID=<access_key_value>&SecretAccessKey=<secret_key_value>&DBUser=<database_user_name>

where:

  • <redshift_clustername>: the name of the Amazon Redshift cluster

  • <region_name>: region identifier where the cluster is located

  • <port_number>: port number to use to access the cluster

  • <database_name>: name of the Redshift database to which to connect

  • <access_key_value>: identifier for the AWS key

  • <secret_key_value>: identifier for the AWS secret

  • <database_user_name>: user identifier for connecting to the database

Access through IAM role and temporary credentials

The following example connection URL uses an AWS/Key secret combination using temporary credentials:

jdbc:redshift:iam://<redshift_clustername:region_name>:<port_number>/<database_name>?AccessKeyID=<access_key_value>&SecretAccessKey=<secret_key_value>&SessionToken=<session_token>&DBUser=<database_user_name>

where:

  • See previous.

  • <session_token>: the AWS session token retrieved when using temporary credentials. The session token is requested by Designer Cloud Powered by Trifacta platform when using AWS temporary credentials.For more information, seeConfigure AWS Per-User Auth for Temporary Credentials.

Driver Information

This connection uses the following driver:

Create via API

For more information, see https://api.trifacta.com/ee/9.7/index.html#operation/createConnection

API:

  • Type: redshift

  • vendor: redshift

Troubleshooting

For more information, see https://docs.aws.amazon.com/redshift/latest/mgmt/troubleshooting-connections.html.

Testing

Import a dataset from Amazon Redshift.

Add it to a flow, and specify a publishing action. Run a job.

Note

When publishing to Amazon Redshift through the Publishing dialog, output must be in Avro or JSON format. This limitation does not apply to direct writing to Amazon Redshift.

For more information, see Verify Operations.

After you have run your job, you can publish the results to Amazon Redshift through the Job Details page. See Publishing Dialog.

Using Redshift Connections

Uses of Redshift

Designer Cloud Powered by Trifacta platform can use Redshift for the following tasks:

  1. Create datasets by reading from Redshift tables.

  2. Write to Redshift tables with your job results.

  3. Ad-hoc publication of data to Redshift.

Before you begin using Redshift

  • Enable S3 Sources: Redshift integration requires the following:

    • S3 is set to the base storage layer.

    • For more information, see S3 Access.

  • Read Access: Your Redshift administrator must configure read permissions. Your administrator should provide a database for upload to your Redshift datastore.

  • Write Access: You can write and publish jobs results to Redshift.

Secure access

SSL is required.

Storing data in Redshift

Your Redshift administrator should provide database access for storing datasets. Users should know where shared data is located and where personal data can be saved without interfering with or confusing other users.

Note

does not modify source data in Redshift. Datasets sourced from Redshift are read without modification from their source locations.

Reading from Redshift

You can create a Alteryx dataset from a table or view stored in Redshift.

Note

The Redshift cluster must be in the same region as the default S3 bucket.

Note

If a Redshift connection has an invalid iamRoleArn, you can browse and import datasets. However, any jobs executed using this connection fail. If the iamRoleArn is invalid, the only samples that you can generate are Quick Random samples; other sampling jobs fail.

For more information, see Database Browser.

Writing to Redshift

Note

You cannot publish to a Redshift database that is empty. The database must contain at least one table.

You can write back data to Redshift using one of the following methods:

  • Job results can be written directly to Redshift as part of the normal job execution.

  • As needed, you can publish results to Redshift for previously executed jobs.

    Note

    You cannot re-publish results to Redshift if the original job published to Redshift. However, if the base job succeeded but publication to Redshift failed, you can publish from the Publishing dialog.

    Note

    To publish to Redshift, the source results must be in Avro or JSON format.

  • For more information on how data is converted to Redshift, see Redshift Data Type Conversions.

Data Validation issues:

  • No validation is performed for the connection and any required permissions during job execution. So, you can be permitted to launch your job even if you do not have sufficient connectivity or permissions to access the data. The corresponding publish job fails at runtime.

  • Prior to publication, no validation is performed on whether a target is a table or a view, so the job that was launched fails at runtime.

Reference

Supported Versions: n/a

Supported Environments:

Note

S3 must be set as the base storage layer. See Set Base Storage Layer.

Operation

Amazon

Microsoft Azure

Read

Not supported

Supported

Not supported

Write

Not supported

Supported

Not supported