This install process applies to installing  on an AWS infrastructure that you manage. 

AWS Marketplace deployments:

NOTE: Content in this section does not apply to deployments from the AWS Marketplace, which provide fewer deployment and configuration options. For more information, see the AWS Marketplace.

Scenario Description

NOTE: All hardware in use for supporting the platform is maintained within the enterprise infrastructure on AWS.

NOTE: When the above installation and configuration steps have been completed, the platform is operational. Additional configuration may be required, which is referenced at the end of this section.

For more information on deployment scenarios, see Supported Deployment Scenarios for AWS.

Product Limitations

The following limitations apply to installations of  on AWS:


Desktop Requirements

AWS Pre-requisites

Depending on which of the following AWS components you are deploying, additional pre-requisites and limitations may apply. Please review these sections as well.


Before you begin, please verify that you have completed the following:


  1. Review Planning Guide: Please review and verify Install Preparation and sub-topics.
    1. Limitations: For more information on limitations of this scenario, see Product Limitations in the Install Preparation area.
  2. Read: Please read this entire document before you create the EMR cluster or install the .

  3. Acquire Assets: Acquire the installation package for your operating system and your license key. For more information, contact .
    1. If you are completing the installation without Internet access, you must also acquire the offline versions of the system dependencies. See Install Dependencies without Internet Access.
  4. VPC: Enable and deploy a working AWS VPC.
  5. S3: Enable and deploy an AWS S3 bucket to use as the base storage layer for the platform. In the bucket, the platform stores metadata in the following location:



  6. IAM Policies: Create IAM policies for access to the S3 bucket. Required permissions are the following: 
  7. EC2 instance role: Create an EC2 instance role for your S3 bucket policy. See
  8. EC2 instance: Deploy an AWS EC2 with SELinux where the  can be installed.
    1. The required set of ports must be enabled for listening. See System Ports.

    2. This node should be dedicated for .

      NOTE: The EC2 node must meet the system requirements. For more information, see System Requirements.

  9. EMR cluster: An existing EMR cluster is required. 
    1. Cluster sizing: Before you begin, you should allocate sufficient resources for sizing the cluster. For guidance, please contact your .

    2. See Deploy the Cluster below.
  10. Databases:
    1. The platform utilizes a set of databases that must be accessed from the . Databases are installed as part of the workflow described later.
    2. For more information on the supported databases and versions, see System Requirements.
    3. For more information on database installation requirements, see Install Databases.
    4. If installing databases on Amazon RDS an admin account to RDS is required. For more information, see Install Databases on Amazon RDS.

AWS Information

Before you begin installation, please acquire the following information from AWS:

Internet access

Deploy the Cluster

In your AWS infrastructure, you must deploy a supported version of EMR across a recommended number of nodes to support the expected data volumes of your .

For more information on the supported EMR distributions, see  Supported Deployment Scenarios for AWS.

When you configure the platform to integrate with the cluster, you must acquire some information about the cluster resources. For more information on the set of information to collect, see Pre-Install Checklist in the Install Preparation area.

Deploy the EC2 Node

An EC2 node of the cluster must be deployed to host the  software. For more information on the requirements of this node, see System Requirements

When you configure the platform to integrate with the cluster, you must acquire some information about the cluster resources. For more information on the set of information to collect, see Pre-Install Checklist in the Install Preparation area.

Here are some guidelines for deploying the EC2 cluster from the EC2 cluster:

  1. Instance size: Select the instance size.
  2. Network: Configure the VPC, subnet, firewall and other configuration settings necessary to communicate with the instance. 
  3. Auto-assigned Public IP: You must create a public IP to access the .
  4. EC2 role: Select the EC2 role that you created.
  5. Local storage: Select a local EBS volume. The default volume includes 100GB storage.

    NOTE: The local storage environment contains the , the product installation, and its log files. No source data is ever stored within the product.

  6. Security group: Use a security group that exposes access to port 3005, which is the default port for the platform. 
  7. Create an AWS key-pair for access: This key is used to provide SSH access to the platform, which may be required for some admin tasks.
  8. Save your changes.

Install Workflow

NOTE: These steps are covered in greater detail later in this section.

After you have completed, the above, please complete these steps listed in order:

  1. Install software: Install the  software on the EC2 node you created. See Install Software.

  2. Install databases: The platform requires several databases for storing metadata.

    NOTE: The software assumes that you are installing the databases on a PostgreSQL server on the same node as the software. If you are not or are changing database names or ports, additional configuration is required as part of this installation process.

    For more information, see Install Databases.

  3. Start the platform: For more information, see Start and Stop the Platform.
  4. Login to the application: After software and databases are installed, you can login to the application to complete configuration:
    1. See Login.
    2. As soon as you login, you should change the password on the admin account. In the left menu bar, select Settings > Admin Settings. Scroll down to Manage Users. For more information, see Change Admin Password.

Tip: At this point, you can access the online documentation through the application. In the left menu bar, select Help menu > Product Docs. All of the following content, plus updates, is available online. See Documentation below.

Configure for EMR

NOTE: If you are creating a new EMR cluster as part of this installation process, please skip this section. That workflow is covered later in the document. For more information, see Configure for EMR.

Please complete the following configuration to enable access to your pre-existing EMR cluster from the .

IAM and Security Group updates

You must make changes to your IAM and Security Group changes to enable the  to communicate with your existing EMR cluster and your EMR cluster to read/write to the . Below are the requirements and suggested implementation details. Please adapt these suggestions to fit your environment as long as the requirements are satisfied. 

For additional documentation around these changes:


must be permitted to use your EMR cluster.

    "Version": "2008-10-17",
    "Statement": [
            "Action": [
            "Resource": "*",
            "Effect": "Allow"

EMR EC2 instance role must be permitted to use the .

    "Version": "2008-10-17",
    "Statement": [
            "Action": [
            "Resource": "*",
            "Effect": "Allow"
            "Action": [
            "Resource": [
            "Effect": "Allow"

Your EMR Service Role should permit access to the .

NOTE: This example is not a complete policy. You should update your existing policy with these statements.

            "Action": [
            "Resource": "*",
            "Effect": "Allow"
            "Action": [
            "Resource": [
            "Effect": "Allow"

Your EMR cluster master node must permit the to access it.

  • The must be able to communicate with your EMR master node on TCP ports 18080 and 8088.
  • You should create a security group and then associate it with your EMR master node using the "additional security groups" functionality.
  • For future ease of use, you should specify the security group associated with your as the source.

Additional configuration must be applied within the platform. These steps are described later.

Additional Configuration for AWS Installs

Apply license key to EC2 node


  1. Acquire the license.json license key file that was provided to you by your .

  2. Transfer the license key file to the EC2 node that is hosting the . Navigate to the directory where you stored it.

  3. Make the  the owner of the file:

    sudo chown trifacta:trifacta license.json

  4. Make sure that the   has read permissions on the file:

    sudo chmod 644 license.json

  5. Copy the license key file to the proper location:

    cp license.json /opt/trifacta/license/

Launch the platform

For more information on how to launch the platform, see Start and Stop the Platform.

When the instance is spinning up for the first time, performance may be slow. When the instance is up, navigate to the following:


When the login screen appears, enter the default admin credentials provided to you.

NOTE: As soon as you login as an admin for the first time, you should immediately change the password. From the left nav bar, select Settings > Settings > User Profile. Change the password and click Save to restart the platform.

Configure for EMR clusters

The following steps apply to configure the platform to integrate with the EMR cluster:

  1. From the application menu, select the Settings menu. Then, click Settings > Admin Settings
  2. In the Admin Settings page, you can configure many aspects of the platform, including user management tasks, and perform restarts to apply the changes.
    1. In the Search bar, enter the following:

    2. Set the value of this setting to be S3 bucket name.

  3. Check the following setting. Verify that it is set to 2.3.0:

    "spark.version": "2.3.0",

  4. The following setting must be specified.


    You can set the above value to either of the following:

    aws.mode valueDescription
    systemSet the mode to system to enable use of EC2 instance-based authentication for access.
    userSet the mode to user to utilize user-based credentials. This mode requires additional configuration.

    Details on the above configuration are described later.

  5. Set the following parameter to true, which instructs the  to run jobs on the integrated EMR cluster:

    "webapp.runinEMR" = true,

  6. In the Admin Settings page, locate the External Service Settings section. 

  7. In the Admin Settings page, locate the External Service Settings section.

    1. AWS EMR Cluster ID: Paste the value for the EMR Cluster ID for the cluster to which the platform is connecting.

    2. AWS Region: Enter the region where your EMR cluster is located.
    3. Resource Bucket: Enter the name of the S3 bucket to use.
    4. Resource Path: you should use something like EMRLOGS.
  8. Click Save underneath the External Service Settings section.

Set base storage layer

The platform requires that one backend datastore be configured as the base storage layer. This base storage layer is used for storing uploaded data and writing results and profiles. 

NOTE: By default, the base storage layer for is set to HDFS. You must change this value for S3. After this base storage layer is defined, it cannot be changed again.

See Set Base Storage Layer.

Verify Operations

NOTE: You can try to verify operations using the running environment at this time. While you can also try to run a job in the Spark running environment, additional configuration may be required to complete the integration. These steps are listed under Next Steps below.



Tip: You should access online documentation through the product. Online content may receive updates that are not present in PDF content.

You can access complete product documentation online and in PDF format. From within the , select Help menu > Product Docs.

Next Steps

After you have accessed the documentation, the following topics are relevant to AWS enterprise infrastructure deployments.

NOTE: These materials are located in the Configuration Guide.

Please review them in order.

Required Platform Configuration

This section covers the following topics, some of which should already be completed:

  • Set Base Storage Layer - The base storage layer must be set once and never changed. Set this value to s3.

  • Create Encryption Key File - If you plan to integrate the platform with any relational sources, including Redshift, you must create an encryption key file and store it on the
  • Running Environment Options - Depending on your scenario, you may need to perform additional configuration for your available running environment(s) for executing jobs.
  • Profiling Options - In some environments, tweaks to the settings for visual profiling may be required. You can disable visual profiling if needed.
  • Configure for Spark - If you are enabling the Spark running environment, please review and verify the configuration for integrating the platform with the Spark running environment.

Configure for EMR

Set up for a new EMR cluster. Some content may apply to existing EMR clusters.

Enable Integration with Compressed ClustersIf the Hadoop cluster uses compression, additional configuration is required.
Enable Integration with Cluster High Availability

If you are integrating with high availability on the Hadoop cluster, please complete these steps.

  • If you are integrating with high availability on the Hadoop cluster, HttpFS must be enabled in the platform. HttpFS is required in other, less-common cases. See Enable HttpFS.
Enable Relational Connections

Enable integration with relational databases, including Redshift.

Configure for KMSIntegration with the Hadoop cluster's key management system (KMS) for encrypted transport. Instructions are provided for distribution-specific versions of Hadoop.
Configure Security

A list of topics on applying additional security measures to the and how integrates with Hadoop.

Configure SSO for AD-LDAPPlease complete these steps if you are integrating with your enterprise's AD/LDAP Single Sign-On (SSO) system.


For more information on upgrading your  on AWS, please contact .

Related Topics