Contents:
This documentation applies to installation from a supported Marketplace. Please use the installation instructions provided with your deployment.
If you are installing or upgrading a Marketplace deployment, please use the available PDF content. You must use the install and configuration PDF available through the Marketplace listing.
This guide steps through the requirements and process for installing Data Preparation for Amazon Redshift and S3 through the AWS Marketplace.
Product Limitations
- Connectivity to sources other than S3 and Redshift is not supported.
- Jobs must be executed on the Alteryx Server. No other running environment integrations are supported.
- Anomaly and stratified sampling are not supported in this deployment.
- When publishing single files to S3, you cannot apply an
append
publishing action. Data Preparation for Amazon Redshift and S3 must be deployed into an existing Virtual Private Cloud (VPC).
The EC2 instance, S3 buckets, and any connected Redshift databases must be located in the same Amazon region. Cross-region integrations are not supported at this time.
NOTE: HDFS integration is not supported for Amazon AMI installations.
- The S3 bucket automatically created by the Marketplace CloudFormation template is not automatically deleted when you delete the stack in CloudFormation. You must empty the bucket and delete it, which can be done through the AWS Console.
Install and Upgrade Methods
You can install the software using either of the following methods:
- CloudFormation Template: This method of installation utilizes an Amazon CloudFormation template to install and configure a working system that includes:
- EC2 instance
- Alteryx node and software
- S3 bucket with supporting policies
EC2 installation: This method allows you to set up the EC2 instance according to your enterprise requirements, including its sizing and policies.
Data Preparation for Amazon Redshift and S3 on AWS | Through EC2 with AMI ID | Through CloudFormation template |
---|---|---|
Install | If you know the AMI ID for Data Preparation for Amazon Redshift and S3, you can install the product through EC2. NOTE: Please verify that the additional pre-requisites have been met. See below. | Supported. Instructions are provided below. Tip: Using the CloudFormation template is the recommended method of installing the product. |
Upgrade | Supported. See Upgrade for AWS Marketplace. | NOTE: This method of upgrading the product is not supported. Using the CloudFormation template will overwrite all security groups and policies. |
Pre-requisites
This document assumes that you are setting up the product to use Amazon's preferred EC2 role-based authentication for access to AWS resources.
Tip: Using EC2 role-based authentication is recommended by AWS. For more information, see https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#use-roles-with-ec2.
This product also supports the following authentication methods:
- System mode - All users of the product use the same key and secret combination to access resources.
- User mode - Each user has a separately specified key and secret combination to access resources.
If you are using one of these two access methods, please do the following:
- Specify an EC2 role without any permissions. Specifically, it should have no data access permissions, as this role cannot be changed at a later time.
- Complete the following sequence through the Install Steps. Specify the above EC2 role as part of the configuration.
- When you launch the product, you can specify the appropriate access mode through the platform. For more information, see Configure for AWS in the Install Guide.
- This content is also available through the online Documentation referenced at the end of this document.
- Complete any related configuration through AWS as needed.
Tip: If you want to use EC2 roles at a later time, you can just apply AWS policies to the empty role you created here. Additional configuration is required in the platform to use this role.
Internet access
From AWS, the Designer Cloud Powered by Trifacta platform requires Internet access for the following services: NOTE: Depending on your AWS deployment, some of these services may not be required. NOTE: If the Designer Cloud Powered by Trifacta platform is hosted in a VPC where Internet access is restricted, access to S3, KMS and STS services must be provided by creating a VPC endpoint. If the platform is accessing an EMR cluster, a proxy server can be configured to provide access to the AWS ElasticMapReduce regional endpoint.
SELinux
By default, Data Preparation for Amazon Redshift and S3 is installed on a server with SELinux enabled. Security-enhanced Linux (SELinux) provides a set of security features for, among other things, managing access controls.
Tip: The following may be applied to other deployments of the Designer Cloud Powered by Trifacta platform on servers where SELinux has been enabled.
In some cases, SELinux can interfere with normal operations of platform software. If you are experiencing connectivity problems related to SELinux, you can do either one of the following:
- Disable SELinux on the server. For more information, please see the CentOS documentation.
- Apply the following commands on the server, as root:
- Open ports on the server for listening.
By default, the Designer Cloud application listens on port 3005. The following opens that port when SELinux is enabled:
semanage port -a -t http_port_t -p tcp 3005
- Repeat the above step for any other ports that you wish to open on the server.
Permit nginx, the proxy on the Alteryx node, to open websockets:
setsebool -P httpd_can_network_connect 1
- Open ports on the server for listening.
Install
Desktop Requirements
- All desktop users of the platform must have the latest version of Google Chrome installed on their desktops.
- Google Chrome must have the PNaCl client installed and enabled.
- PNaCl Version:
0.50.x.y
or later
- All desktop users must be able to connect to the EC2 instance through the enterprise infrastructure.
Sizing Guide
NOTE: The following guidelines apply only to Data Preparation for Amazon Redshift and S3.
Use the following guidelines to select your instance size:
NOTE: Data Preparation for Amazon Redshift and S3 enforces a maximum limit of 30 users.
Instance Type | Max users | Avg. size of jobs on Alteryx Server (GB) |
---|---|---|
m4.4xlarge | 5 | 20 GB |
Pre-requisites
Before you install the platform, please verify that the following steps have been completed.
EULA. Before you begin, please review the End-User License Agreement. See End-User License Agreement.
- SSH Key-pair. Please verify that there is an SSH key pair available to assign to the Alteryx server.
Additional pre-requisites for EC2 installation
If you are installing the product through EC2, please verify that the following additional requirements are met:
- IAM policies. Create IAM policies for access to the S3 bucket. Required permissions are the following:
The system account or individual user accounts must have full permissions for the S3 bucket:
Delete*, Get*, List*, Put*, Replicate*, Restore*
These policies must apply to the bucket and its contents. Example:
"arn:aws:s3:::my-trifacta-bucket-name" "arn:aws:s3:::my-trifacta-bucket-name/*"
- See https://console.aws.amazon.com/iam/home#/policies
- EC2 instance role. Create an EC2 instance role for this policy. See https://console.aws.amazon.com/iam/home#/roles.
Install Steps - CloudFormation template
This install process creates the following:
- Alteryx node on an EC2 node
- S3 bucket
- IAM roles and policies to access the S3 bucket from the Alteryx node
Steps:
- In the Marketplace listing, click Deploy into an existing VPC.
- Select Template: The template path is automatically populated for you.
- Specify Details:
Stack Name: Display name of the application
NOTE: Each instance of the Designer Cloud Powered by Trifacta platform should have a separate name.
Instance Type: Please select the appropriate instance depending on the number of users and data volumes of your environment. For more information, see the Sizing Guide above.
Key Pair: Select the SSH pair to use for Alteryx Instance access.
Allowed HTTP Source: Please specify the IP address or range of address from which HTTP/HTTPS connections to the application are permitted.
Allowed SSH Source: Please specify the IP address or range of address from which SSH connections to the application are permitted.
- Options: None of these is required for installation. Specify your options as needed for your environment.
- Review: Review your installation and configured options.
- Select the checkbox at the end of the page.
- To launch the configured instance, click Create.
- In the Stacks list, select the name of your application. Click the Outputs tab and collect the following information. Instructions in how to use this information is provided later.
Parameter Description Use TrifactaUrl value URL and port number to which to connect to the Alteryx application
Users must connect to this IP address and port number to access. TrifactaBucket The address of the default S3 bucket This value must be applied through the application. TrifactaInstanceId The identifier for the instance of the platform This value is the default password for the admin account.
NOTE: This password must be changed immediately.
When the instance is spinning up for the first time, performance may be slow. When the instance is up, please navigate to the
TrifactaUrl
location:http://<public_hostname>:3005
- When the login screen appears, enter the following:
- Username:
admin@trifacta.local
Password: (the
TrifactaInstanceId
value)NOTE: As soon as you login as an admin for the first time, you should immediately change the password. In the left side bar, click the Settings menu at the bottom. Then, click Settings >User Profile. Change the password and click Save to restart the platform.
- Username:
- From the application menu, select the Settings menu. Then, click Settings >Admin Settings.
- In the Admin Settings page, you can configure many aspects of the platform, including user management tasks, and perform restarts to apply the changes.
In the Search bar, enter the following:
aws.s3.bucket.name
Set the value of this setting to be the
TrifactaBucket
value that you collected from the Outputs tab.
The following setting must be specified.
"aws.mode":"system",
You can set the above value to either of the following:
aws.mode value Description system
Set the mode to system
to enable use of EC2 instance-based authentication for access.user
Set the mode to user
to utilize user-based credentials. This mode requires additional configuration.Details on the above configuration are described later.
Click Save.
When the platform restarts, you can begin using the product.
Install Steps - EC2 instance
- Launch Data Preparation for Amazon Redshift and S3 using the AWS AMI ID for the product.
- In the EC2 Console:
- Instance size: Select the instance size.
- Network: Configure the VPC, subnet, firewall and other configuration settings necessary to communicate with the instance.
- Auto-assigned Public IP: You must create a public IP to access the Designer Cloud Powered by Trifacta platform.
- EC2 role: Select the EC2 role that you created.
Local storage: Select a local EBS volume. The default volume includes 100GB storage.
NOTE: The local storage environment contains the Alteryx databases, the product installation, and its log files. No source data is ever stored within Data Preparation for Amazon Redshift and S3.
- Security group: Use a security group that exposes access to port 3005, which is the default port for the platform.
- Create an AWS key-pair for access:This key is used to provide SSH access to the platform, which may be required for some admin tasks. Save key file to your local computer for later use.
- Save your changes.
Launch the configured version of Data Preparation for Amazon Redshift and S3.
NOTE: From the EC2 Console, please acquire the
instanceId
, which is needed in a later step.When the instance is spinning up for the first time, performance may be slow. When the instance is up, please navigate to the following:
http://<public_hostname>:3005
- When the login screen appears, enter the following:
- Username:
admin@trifacta.local
Password: (the
instanceId
value)NOTE: As soon as you login as an admin for the first time, you should immediately change the password. Select the User Profile menu item in the upper-right corner. Change the password and click Save to restart the platform.
- Username:
- From the application menu, select Settings menu > Admin Settings.
- In the Admin Settings page, you can configure many aspects of the platform, including user management tasks, and perform restarts to apply the changes.
In the Search bar, enter the following:
aws.s3.bucket.name
Set the value of this setting to be the bucket that you created for Data Preparation for Amazon Redshift and S3.
The following setting must be specified.
"aws.mode":"system",
You can set the above value to either of the following:
aws.mode value Description system
Set the mode to system
to enable use of EC2 instance-based authentication for access.user
Set the mode to user
to utilize user-based credentials. This mode requires additional configuration.Details on the above configuration are described later.
Click Save.
When the platform restarts, you can begin using the product.
SSH Access
If you need to SSH to the Alteryx node, you can use the following command:
ssh -i <path_to_key_file> <userId>@<tri_node_DNS_or_IP>
Parameter | Description |
---|---|
<path_to_key_file> | Path to the key file stored on your local computer. |
<userId> | The user ID is always centos . |
<tri_node_DNS_or_IP> | DNS or IP address of the Alteryx node |
Upgrade
For more information, see Upgrade for AWS Marketplace.
Documentation
You can access complete product documentation online and in PDF format. From within the product, select Help menu > Product Docs.
This page has no comments.