Page tree

Release 8.2.1



Contents:

   

Contents:


This documentation applies to installation from a supported Marketplace. Please use the installation instructions provided with your deployment.


If you are installing or upgrading a Marketplace deployment, please use the available PDF content. You must use the install and configuration PDF available through the Marketplace listing.

The Trifacta® platform can be hosted within Amazon and supports integrations with multiple services from Amazon Web Services, including combinations of services for hybrid deployments. This section provides an overview of the integration options, as well as links to related configuration topics.

For an overview of AWS deployment scenarios, see Supported Deployment Scenarios for AWS.

Internet Access

From AWS, the Trifacta platform requires Internet access for the following services:

NOTE: Depending on your AWS deployment, some of these services may not be required.

 

  • AWS S3
  • Key Management System [KMS] (if sse-kms server side encryption is enabled)
  • Secure Token Service [STS] (if temporary credential provider is used)
  • EMR (if integration with EMR cluster is enabled)

NOTE: If the Trifacta platform is hosted in a VPC where Internet access is restricted, access to S3, KMS and STS services must be provided by creating a VPC endpoint. If the platform is accessing an EMR cluster, a proxy server can be configured to provide access to the AWS ElasticMapReduce regional endpoint.

Database Installation

The following database scenarios are supported.

Database HostDescription
Cluster node

By default, the Trifacta databases are installed on PostgreSQL instances in the Trifacta node or another accessible node in the enterprise environment. For more information, see Install Databases.

Amazon RDS

For Amazon-based installations, you can install the Trifacta databases on PostgreSQL instances on Amazon RDS. For more information, see Install Databases on Amazon RDS.

Base AWS Configuration

The following configuration topics apply to AWS in general.

You can apply this change through the Admin Settings Page (recommended) or

trifacta-conf.json
. For more information, see Platform Configuration Methods.

Base Storage Layer

NOTE: The base storage layer must be set during initial configuration and cannot be modified after it is set.

S3: Most of these integrations require use of S3 as the base storage layer, which means that data uploads, default location of writing results, and sample generation all occur on S3. When base storage layer is set to S3, the Trifacta platform can:

  • read and write to S3
  • read and write to Redshift
  • connect to an EMR cluster

HDFS: In on-premises installations, it is possible to use S3 as a read-only option for a Hadoop-based cluster when the base storage layer is HDFS. You can configure the platform to read from and write to S3 buckets during job execution and sampling. For more information, see Enable S3 Access.

For more information on setting the base storage layer, see Set Base Storage Layer.

For more information, see Storage Deployment Options.

Configure AWS Region

For Amazon integrations, you can configure the Trifacta node to connect to Amazon datastores located in different regions. 

NOTE: This configuration is required under any of the following deployment conditions:

  1. The Trifacta node is installed on-premises, and you are integrating with Amazon resources.
  2. The EC2 instance hosting the Trifacta node is located in a different AWS region than your Amazon datastores.
  3. The Trifacta node or the EC2 instance does not have access to s3.amazonaws.com.

 

  1. In the AWS console, please identify the location of your datastores in other regions. For more information, see the Amazon documentation.
  2. Login to the Trifacta application.
  3. You can apply this change through the Admin Settings Page (recommended) or
    trifacta-conf.json
    . For more information, see Platform Configuration Methods.
  4. Set the value of the following property to the region where your S3 datastores are located:

    aws.region

    If the above value is not set, then the Trifacta platform attempts to infer the region based on default S3 bucket location.

  5. Save your changes.

Configure region for AWS GovCloud

If your instance of the Trifacta platform is deployed in the AWS GovCloud, additional configuration is required. 

NOTE: GovCloud authentication is completely isolated from Amazon.com. Key-secret combinations for AWS do not apply to AWS GovCloud. 

For more information, see https://docs.aws.amazon.com/govcloud-us/latest/UserGuide/govcloud-differences.html.

NOTE: In AWS GovCloud, the AWS S3 endpoint must be configured in the private subnet or VPC to be able to make outbound requests to S3 resources.

You must configure the region and endpoint settings to communicate with GovCloud S3 resources. 

Steps:

  1. You can apply this change through the Admin Settings Page (recommended) or
    trifacta-conf.json
    . For more information, see Platform Configuration Methods.
  2. In the Admin Settings page, specify the following parameters as follows:

    ParameterDescription
    aws.regionSpecify the AWS GovCloud region where your S3 resources are located.
    aws.s3.endpoint

    Specify the private subnet or VPC endpoint through which the Trifacta platform can reach S3 resources.


  3. Save your changes.

  4. Login to the Trifacta node as an administrator. 
  5. Edit the following file:

    /opt/trifacta/conf/env.sh
  6. Add the following line:

    AWS_REGION=<aws.region>

    where:
    <aws.region> is the value you inserted for the AWS GovCloud region in the Admin Settings page.

  7. Save the file. 
  8. If the following does not apply, restart the platform.

Accessing AWS secret region using an emulation platform:

  1. If you are connecting to the AWS secret region using an emulation platform with custom TLS certificates, you must install and reference a valid TLS/SSL certificate for the emulation platform. 
    1. To the env.sh file, please add the following line:

      NODE_EXTRA_CA_CERTS=/home/trifacta/ca-chain.cert.pem

      NOTE: The above certificate must be a valid TLS/SSL certificate for the emulation platform.

    2. For AWS requests from Java services, you must add the custom CA certificate to the CA certificates for the system. For more information, please see the documentation for your emulation platform.

  2. Save your changes.

  3. Restart the platform.

AWS Authentication

For more information, see Configure for AWS Authentication.

AWS Storage

S3 Layer

To integrate with S3, additional configuration is required. See Enable S3 Access.

S3 Connections

Users can create secondary connections to specific S3 buckets. For more information, see S3 Connections

Redshift Connections

You can create connections to one or more Redshift databases, from which you can read database sources and to which you can write job results. Samples are still generated on S3.

NOTE: Relational connections require installation of an encryption key file on the Trifacta node. For more information, see Create Encryption Key File.

For more information, see Create Redshift Connections.

Snowflake Connections

Through your AWS deployment, you can access your Snowflake databases. For more information, see Create Snowflake Connections.

AWS Clusters

Trifacta Wrangler Enterprise can integrate with one instance of either of the following. 

NOTE: If Trifacta Wrangler Enterprise is installed through the Amazon Marketplace, only the EMR integration is supported.

EMR

When Trifacta Wrangler Enterprise in installed through AWS, you can integrate with an EMR cluster for Spark-based job execution. For more information, see Configure for EMR.

Hadoop

If you have installed Trifacta Wrangler Enterprise on-premises or directly into an EC2 instance, you can integrate with a Hadoop cluster for Spark-based job execution. See Configure for Hadoop.

This page has no comments.