This documentation applies to installation from a supported Marketplace. Please use the installation instructions provided with your deployment.
If you are installing or upgrading a Marketplace deployment, please use the available PDF content. You must use the install and configuration PDF available through the Marketplace listing.
The Trifacta® platform can be hosted within Amazon and supports integrations with multiple services from Amazon Web Services, including combinations of services for hybrid deployments. This section provides an overview of the integration options, as well as links to related configuration topics.
For an overview of AWS deployment scenarios, see Supported Deployment Scenarios for AWS.
From AWS, the Trifacta platform requires Internet access for the following services: NOTE: Depending on your AWS deployment, some of these services may not be required. NOTE: If the Trifacta platform is hosted in a VPC where Internet access is restricted, access to S3, KMS and STS services must be provided by creating a VPC endpoint. If the platform is accessing an EMR cluster, a proxy server can be configured to provide access to the AWS ElasticMapReduce regional endpoint.
From AWS, the Trifacta platform requires Internet access for the following services:
NOTE: Depending on your AWS deployment, some of these services may not be required.
NOTE: If the Trifacta platform is hosted in a VPC where Internet access is restricted, access to S3, KMS and STS services must be provided by creating a VPC endpoint. If the platform is accessing an EMR cluster, a proxy server can be configured to provide access to the AWS ElasticMapReduce regional endpoint.
The following database scenarios are supported.
By default, the Trifacta databases are installed on PostgreSQL instances in the Trifacta node or another accessible node in the enterprise environment. For more information, see Set up the Databases.
For Amazon-based installations, you can install the Trifacta databases are installed on PostgreSQL instances on Amazon RDS. For more information, see Install Databases on Amazon RDS.
Base AWS Configuration
The following configuration topics apply to AWS in general.
Base Storage Layer
NOTE: The base storage layer must be set during initial configuration and cannot be modified after it is set.
S3: Most of these integrations require use of S3 as the base storage layer, which means that data uploads, default location of writing results, and sample generation all occur on S3. When base storage layer is set to S3, the Trifacta platform can:
- read and write to S3
- read and write to Redshift
- connect to an EMR cluster
HDFS: In on-premises installations, it is possible to use S3 as a read-only option for a Hadoop-based cluster when the base storage layer is HDFS. You can configure the platform to read from and write to S3 buckets during job execution and sampling. For more information, see Enable S3 Access.
For more information on setting the base storage layer, see Set Base Storage Layer.
For more information, see Storage Deployment Options.
Configure AWS Region
For Amazon integrations, you can configure the Trifacta node to connect to Amazon datastores located in different regions.
NOTE: This configuration is required under any of the following deployment conditions:
- The Trifacta node is installed on-premises, and you are integrating with Amazon resources.
- The EC2 instance hosting the Trifacta node is located in a different AWS region than your Amazon datastores.
- The Trifacta node or the EC2 instance does not have access to s3.amazonaws.com.
- In the AWS console, please identify the location of your datastores in other regions. For more information, see the Amazon documentation.
In the Trifacta node, please edit the following file:
Insert the following environment variables:
<regionValue>corresponds to the AWS region identifier (e.g.
- Save the file.
When connecting to AWS, the platform supports the following authentication methods:
Access to AWS resources is managed through a single account. This account is specified based on the credential provider method.
AWS key and secret must be specified for individual users.
NOTE: Creation and use of custom dictionaries is not supported in user mode.
NOTE: The credential provider must be set to
The Trifacta platform supports the following methods of providing credentialed access to AWS and S3 resources.
|default||This method uses the provided AWS Key and Secret values to access resources.|
When you are running the Trifacta platform on an EC2 instance, you can leverage your enterprise IAM roles to manage permissions on the instance for the Trifacta platform.
Default credential provider
Whether the AWS access mode is set to system or user, the default credential provider for AWS and S3 resources is the Trifacta platform.
A single AWS Key and Secret is inserted into platform configuration. This account is used to access all resources and must have the appropriate permissions to do so.
|Each user must specify an AWS Key and Secret into the account user profile to access resources.||For more information on configuring individual user accounts, see Configure Your Access to S3.|
If you are using this method and integrating with an EMR cluster:
- Copying the custom credential JAR file must be added as a bootstrap action to the EMR cluster definition. See Configure for EMR.
As an alternative to copying the JAR file, you can use the EMR EC2 instance-based roles to govern access. In this case, you must set the following parameter:
For more information, see Configure for EC2 Role-Based Authentication.
Instance credential provider
When the platform is running on an EC2 instance, you can manage permissions through pre-defined IAM roles.
NOTE: If the Trifacta platform is connected to an EMR cluster, you can force authentication to the EMR cluster to use the specified IAM instance role. See Configure for EMR.
For more information, see Configure for EC2 Role-Based Authentication.
To integrate with S3, additional configuration is required. See Enable S3 Sources.
You can create connections to one or more Redshift databases, from which you can read database sources and to which you can write job results. Samples are still generated on S3.
NOTE: Relational connections require installation of an encryption key file on the Trifacta node. For more information, see Create Encryption Key File.
For more information, see Create Redshift Connections.
Trifacta Self-Managed Enterprise Edition can integrate with one instance of either of the following.
NOTE: If Trifacta Self-Managed Enterprise Edition is installed through the Amazon Marketplace, only the EMR integration is supported
When Trifacta Self-Managed Enterprise Edition in installed through AWS, you can integrate with an EMR cluster for Spark-based job execution. For more information, see Configure for EMR.
If you have installed Trifacta Self-Managed Enterprise Edition on-premises or directly into an EC2 instance, you can integrate with a Hadoop cluster for Spark-based job execution. See Configure for Hadoop.
This page has no comments.