Contents:
This install process applies to installing Designer Cloud Enterprise Edition on an AWS infrastructure that you manage.
AWS Marketplace deployments:
NOTE: Content in this section does not apply to deployments from the AWS Marketplace, which provide fewer deployment and configuration options. For more information, see the AWS Marketplace.
Scenario Description
NOTE: All hardware in use for supporting the platform is maintained within the enterprise infrastructure on AWS.
- Installation of Designer Cloud Enterprise Edition on an EC2 server in AWS
- Installation of Alteryx databases on AWS
- Integration with a supported EMR cluster.
- Base storage layer and backend datastore of S3
NOTE: When the above installation and configuration steps have been completed, the platform is operational. Additional configuration may be required, which is referenced at the end of this section.
For more information on deployment scenarios, see Supported Deployment Scenarios for AWS.
Product Limitations
The following limitations apply to installations of Designer Cloud Enterprise Edition on AWS:
- No support for Hive integration
- No support for secure impersonation or Kerberos
- No support for high availability and failover
- Job cancellation is not supported on EMR.
- When publishing single files to S3, you cannot apply an
append
publishing action.
Pre-requisites
Desktop Requirements
- All desktop users of the platform should have a supported version of Google Chrome installed on their desktops.
- For more information. see Desktop Requirements.
- If a supported browser is not available within your enterprise, desktop users can install the Alteryx enterprise application as a separate application. For more information, see Install for Wrangler Enterprise Application.
- All desktop users must be able to connect to the EC2 instance through the enterprise infrastructure.
AWS Pre-requisites
Depending on which of the following AWS components you are deploying, additional pre-requisites and limitations may apply. Please review these sections as well.
Prep
Before you begin, please verify that you have completed the following:
- Review Planning Guide: Please review and verify Install Preparation and sub-topics.
- Limitations: For more information on limitations of this scenario, see Product Limitations in the Install Preparation area.
Read: Please read this entire document before you create the EMR cluster or install the Designer Cloud Powered by Trifacta platform.
- Acquire Assets: Acquire the installation package for your operating system and your license key. For more information, contact Alteryx Support.
- If you are completing the installation without Internet access, you must also acquire the offline versions of the system dependencies. See Install Dependencies without Internet Access.
- VPC: Enable and deploy a working AWS VPC.
S3: Enable and deploy an AWS S3 bucket to use as the base storage layer for the platform. In the bucket, the platform stores metadata in the following location:
<S3_bucket_name>/trifacta
- IAM Policies: Create IAM policies for access to the S3 bucket. Required permissions are the following:
The system account or individual user accounts must have full permissions for the S3 bucket:
Delete*, Get*, List*, Put*, Replicate*, Restore*
These policies must apply to the bucket and its contents. Example:
"arn:aws:s3:::my-trifacta-bucket-name" "arn:aws:s3:::my-trifacta-bucket-name/*"
- See https://console.aws.amazon.com/iam/home#/policies
- EC2 instance role: Create an EC2 instance role for your S3 bucket policy. See https://console.aws.amazon.com/iam/home#/roles.
- EC2 instance: Deploy an AWS EC2 with SELinux where the Alteryx software can be installed.
The required set of ports must be enabled for listening. See System Ports.
This node should be dedicated for Alteryx use.
NOTE: The EC2 node must meet the system requirements. For more information, see System Requirements.
- EMR cluster: An existing EMR cluster is required.
Cluster sizing: Before you begin, you should allocate sufficient resources for sizing the cluster. For guidance, please contact your Alteryx representative.
- See Deploy the Cluster below.
- Databases:
- The platform utilizes a set of databases that must be accessed from the Alteryx node. Databases are installed as part of the workflow described later.
- For more information on the supported databases and versions, see System Requirements.
- For more information on database installation requirements, see Install Databases.
- If installing databases on Amazon RDS an admin account to RDS is required. For more information, see Install Databases on Amazon RDS.
AWS Information
Before you begin installation, please acquire the following information from AWS:
- EMR:
- AWS region for the EMR cluster, if it exists.
- ID for EMR cluster, if it exists
- If you are creating an EMR cluster as part of this process, please retain the ID.
- The EMR cluster must allow access from the Alteryx node. This configuration is described later.
- Subnet: Subnet within your virtual private cloud (VPC) where you want to launch the Designer Cloud Powered by Trifacta platform.
- This subnet should be in the same VPC as the EMR cluster.
- Subnet can be private or public.
- If it is private and it cannot access the Internet, additional configuration is required. See below.
- S3:
- Name of the S3 bucket that the platform can use
Path to resources on the S3 bucket
- EC2:
- Instance type for the Alteryx node
Internet access
From AWS, the Designer Cloud Powered by Trifacta platform requires Internet access for the following services: NOTE: Depending on your AWS deployment, some of these services may not be required. NOTE: If the Designer Cloud Powered by Trifacta platform is hosted in a VPC where Internet access is restricted, access to S3, KMS and STS services must be provided by creating a VPC endpoint. If the platform is accessing an EMR cluster, a proxy server can be configured to provide access to the AWS ElasticMapReduce regional endpoint.
Deploy the Cluster
In your AWS infrastructure, you must deploy a supported version of EMR across a recommended number of nodes to support the expected data volumes of your Alteryx jobs.
- For more information on suggested sizing, see Sizing Guidelines in the Install Preparation area.
For more information on the supported EMR distributions, see Supported Deployment Scenarios for AWS.
When you configure the platform to integrate with the cluster, you must acquire some information about the cluster resources. For more information on the set of information to collect, see Pre-Install Checklist in the Install Preparation area.
Deploy the EC2 Node
An EC2 node of the cluster must be deployed to host the Designer Cloud Powered by Trifacta platform software. For more information on the requirements of this node, see System Requirements.
When you configure the platform to integrate with the cluster, you must acquire some information about the cluster resources. For more information on the set of information to collect, see Pre-Install Checklist in the Install Preparation area.
Here are some guidelines for deploying the EC2 cluster from the EC2 cluster:
- Instance size: Select the instance size.
- Network: Configure the VPC, subnet, firewall and other configuration settings necessary to communicate with the instance.
- Auto-assigned Public IP: You must create a public IP to access the Designer Cloud Powered by Trifacta platform.
- EC2 role: Select the EC2 role that you created.
Local storage: Select a local EBS volume. The default volume includes 100GB storage.
NOTE: The local storage environment contains the Alteryx databases, the product installation, and its log files. No source data is ever stored within the product.
- Security group: Use a security group that exposes access to port 3005, which is the default port for the platform.
- Create an AWS key-pair for access: This key is used to provide SSH access to the platform, which may be required for some admin tasks.
Save your changes.
Install Workflow
NOTE: These steps are covered in greater detail later in this section.
After you have completed, the above, please complete these steps listed in order:
Install software: Install the Designer Cloud Powered by Trifacta platform software on the EC2 node you created. See Install Software.
Install databases: The platform requires several databases for storing metadata.
NOTE: The software assumes that you are installing the databases on a PostgreSQL server on the same node as the software. If you are not or are changing database names or ports, additional configuration is required as part of this installation process.
For more information, see Install Databases.
- Start the platform: For more information, see Start and Stop the Platform.
- Login to the application: After software and databases are installed, you can login to the application to complete configuration:
- See Login.
As soon as you login, you should change the password on the admin account. In the left menu bar, select Settings > Admin Settings. Scroll down to Manage Users. For more information, see Change Admin Password.
Tip: At this point, you can access the online documentation through the application. In the left menu bar, select Help menu > Product Docs. All of the following content, plus updates, is available online. See Documentation below.
Configure for EMR
NOTE: If you are creating a new EMR cluster as part of this installation process, please skip this section. That workflow is covered later in the document. For more information, see Configure for EMR.
Please complete the following configuration to enable access to your pre-existing EMR cluster from the Designer Cloud Powered by Trifacta platform.
IAM and Security Group updates
You must make changes to your IAM and Security Group changes to enable the Alteryx instance to communicate with your existing EMR cluster and your EMR cluster to read/write to the Alteryx data bucket. Below are the requirements and suggested implementation details. Please adapt these suggestions to fit your environment as long as the requirements are satisfied.
For additional documentation around these changes:
- https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-roles.html
- https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-additional-sec-groups.html
Requirement | Example |
---|---|
Alteryx EC2 instance role must be permitted to use your EMR cluster. | { "Version": "2008-10-17", "Statement": [ { "Action": [ "elasticmapreduce:DescribeStep", "elasticmapreduce:ListBootstrapActions", "elasticmapreduce:ListClusters", "elasticmapreduce:DescribeCluster", "elasticmapreduce:AddJobFlowSteps", "elasticmapreduce:DescribeJobFlows", "elasticmapreduce:ListInstanceGroups" ], "Resource": "*", "Effect": "Allow" } ] } |
EMR EC2 instance role must be permitted to use the Alteryx data bucket. | { "Version": "2008-10-17", "Statement": [ { "Action": [ "elasticmapreduce:Describe*", "elasticmapreduce:List*", "s3:ListAllMyBuckets", "ec2:Describe*" ], "Resource": "*", "Effect": "Allow" }, { "Action": [ "s3:PutObject", "s3:ListBucket", "s3:GetObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::YOUR-TRIFACTA-BUCKET", "arn:aws:s3:::YOUR-TRIFACTA-BUCKET/*" ], "Effect": "Allow" } ] } |
Your EMR Service Role should permit access to the Alteryx bucket. NOTE: This example is not a complete policy. You should update your existing policy with these statements. | { "Action": [ "s3:HeadBucket", "s3:ListAllMyBuckets" ], "Resource": "*", "Effect": "Allow" }, { "Action": [ "s3:PutObject", "s3:GetObject", "s3:ListBucket", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::YOUR-TRIFACTA-BUCKET", "arn:aws:s3:::YOUR-TRIFACTA-BUCKET/*" ], "Effect": "Allow" }, |
Your EMR cluster master node must permit the Alteryx EC2 instance to access it. |
|
Additional configuration must be applied within the platform. These steps are described later.
Additional Configuration for AWS Installs
Apply license key to EC2 node
Steps:
Acquire the
license.json
license key file that was provided to you by your Alteryx representative.Transfer the license key file to the EC2 node that is hosting the Designer Cloud Powered by Trifacta platform. Navigate to the directory where you stored it.
Make the Alteryx user the owner of the file:
sudo chown trifacta:trifacta license.json
Make sure that the Alteryx user has read permissions on the file:
sudo chmod 644 license.json
Copy the license key file to the proper location:
cp license.json /opt/trifacta/license/
Launch the platform
For more information on how to launch the platform, see Start and Stop the Platform.
When the instance is spinning up for the first time, performance may be slow. When the instance is up, navigate to the following:
http://<public_hostname>:3005
When the login screen appears, enter the default admin credentials provided to you.
NOTE: As soon as you login as an admin for the first time, you should immediately change the password. From the left nav bar, select Settings > Settings > User Profile. Change the password and click Save to restart the platform.
Configure for EMR clusters
The following steps apply to configure the platform to integrate with the EMR cluster:
- From the application menu, select the Settings menu. Then, click Settings > Admin Settings.
- In the Admin Settings page, you can configure many aspects of the platform, including user management tasks, and perform restarts to apply the changes.
In the Search bar, enter the following:
aws.s3.bucket.name
Set the value of this setting to be S3 bucket name.
Check the following setting. Verify that it is set to
2.3.0
:"spark.version": "2.3.0",
The following setting must be specified.
"aws.mode":"system",
You can set the above value to either of the following:
aws.mode value Description system
Set the mode to system
to enable use of EC2 instance-based authentication for access.user
Set the mode to user
to utilize user-based credentials. This mode requires additional configuration.Details on the above configuration are described later.
Set the following parameter to
true
, which instructs the Designer Cloud application to run jobs on the integrated EMR cluster:"webapp.runinEMR" = true,
In the Admin Settings page, locate the External Service Settings section.
In the Admin Settings page, locate the External Service Settings section.
AWS EMR Cluster ID: Paste the value for the EMR Cluster ID for the cluster to which the platform is connecting.
- AWS Region: Enter the region where your EMR cluster is located.
- Resource Bucket: Enter the name of the S3 bucket to use.
- Resource Path: you should use something like
EMRLOGS
.
Click Save underneath the External Service Settings section.
Set base storage layer
The platform requires that one backend datastore be configured as the base storage layer. This base storage layer is used for storing uploaded data and writing results and profiles.
NOTE: By default, the base storage layer for Designer Cloud Enterprise Edition is set to HDFS. You must change this value for S3. After this base storage layer is defined, it cannot be changed again.
Verify Operations
NOTE: You can try to verify operations using the Trifacta Photon running environment at this time. While you can also try to run a job in the Spark running environment, additional configuration may be required to complete the integration. These steps are listed under Next Steps below.
To complete this test, you should locate or create a simple dataset. Your dataset should be created in the format that you wish to test. Characteristics: If you are testing an integration, you should store your dataset in the datastore with which the product is integrated. Tip: Uploading datasets is always available as a means of importing datasets. Steps: Troubleshooting: At this point, you have read access to your datastore from the platform. If not, please check the logs, permissions, and your Alteryx® configuration. If options are presented, select the defaults. Troubleshooting: Later, you can re-run this job on a different running environment. Some formats are not available across all running environments. Checkpoint: You have verified importing from the selected datastore and transforming a dataset. If your job was successfully executed, you have verified that the product is connected to the job running environment and can write results to the defined output location. Optionally, you may have tested profiling of job results. If all of the above tasks completed, the product is operational end-to-end.Prepare Your Sample Dataset
Store Your Dataset
Verification Steps
Documentation
Tip: You should access online documentation through the product. Online content may receive updates that are not present in PDF content.
You can access complete product documentation online and in PDF format. From within the Designer Cloud application, select Help menu > Product Docs.
Next Steps
After you have accessed the documentation, the following topics are relevant to AWS enterprise infrastructure deployments.
NOTE: These materials are located in the Configuration Guide.
Please review them in order.
Topic | Description |
---|---|
Required Platform Configuration | This section covers the following topics, some of which should already be completed:
|
Set up for a new EMR cluster. Some content may apply to existing EMR clusters. | |
Enable Integration with Compressed Clusters | If the Hadoop cluster uses compression, additional configuration is required. |
Enable Integration with Cluster High Availability | If you are integrating with high availability on the Hadoop cluster, please complete these steps.
|
Enable Relational Connections | Enable integration with relational databases, including Redshift.
|
Configure for KMS | Integration with the Hadoop cluster's key management system (KMS) for encrypted transport. Instructions are provided for distribution-specific versions of Hadoop. |
Configure Security | A list of topics on applying additional security measures to the Designer Cloud Powered by Trifacta platform and how integrates with Hadoop. |
Configure SSO for AD-LDAP | Please complete these steps if you are integrating with your enterprise's AD/LDAP Single Sign-On (SSO) system. |
Upgrade
For more information on upgrading your Designer Cloud Enterprise Edition on AWS, please contact Alteryx Customer Success and Services.
This page has no comments.