Contents:
AWS Deployment Scenarios
The following are the basic AWS deployment scenarios.
Trifacta platform deployed through AWS Marketplace:
Deployment Scenario | Trifacta node installation | Base Storage Layer | Storage - S3 | Storage - Redshift | Cluster | Notes |
---|---|---|---|---|---|---|
Trifacta Data Preparation for Amazon Redshift and S3 AWS install through AWS Marketplace CloudFormation template | EC2 | S3 | read/write | read/write | None | Trifacta Data Preparation for Amazon Redshift and S3 does not support integration with any running environment clusters. All job execution occurs on the Trifacta node in the Trifacta Photon running environment. This scenario is suitable for smaller user groups and data volumes. |
Trifacta Wrangler Enterprise AWS install through AWS Marketplace CloudFormation template - with integration to EMR cluster | EC2 | S3 | read/write | read/write | EMR | This deployment scenario integrates by default with an EMR cluster, which is created as part of the process. It does not support integration with a Hadoop cluster. |
Trifacta platform installed on AWS:
Deployment Scenario | Trifacta node installation | Base Storage Layer | Storage - S3 | Storage - Redshift | Cluster | Notes |
---|---|---|---|---|---|---|
Trifacta Wrangler Enterprise AWS install with S3 read access | EC2 | HDFS | read only | Not supported | EMR | When HDFS is the base storage layer, the only accessible AWS resources is read-only access to S3. |
Trifacta Wrangler Enterprise AWS install with S3 read/write access | EC2 | S3 | read/write | read/write | EMR |
Trifacta platform installed on-premises and integrated with AWS resources:
Deployment Scenario | Trifacta node installation | Base Storage Layer | Storage - S3 | Storage - Redshift | Cluster | Notes |
---|---|---|---|---|---|---|
Trifacta Wrangler Enterprise on-premises install with S3 read access | On-premises | HDFS | read only | Not supported | Hadoop | When HDFS is the base storage layer, the only accessible AWS resources is read-only access to S3. For more information, see Install Software. |
Microsoft Azure | Integration with AWS-based resources is not supported. See Install for Azure. |
Legend and Notes:
Column | Notes |
---|---|
Deployment Scenario | Description of the AWS-connected deployment |
Trifacta node installation | Location where the Trifacta node is installed in this scenario. All AWS installations are installed on EC2 instances. |
Base Storage Layer | When the Trifacta platform is first installed, the base storage layer must be set. NOTE: After you have begun using the product, you cannot change the base storage layer. NOTE: Read/write access to AWS-based resources requires that S3 be set as the base storage layer. |
Storage - S3 | Trifacta Wrangler Enterprise supports read access to S3 when the base storage layer is set to HDFS. For read/write access to S3, the base storage layer must be set to S3. |
Storage - Redshift | For access to Redshift, the base storage layer must be set to S3. |
Cluster | List of cluster types that are supported for integration and job execution at scale.
|
Notes | Any additional notes |
AWS Installations
Trifacta Data Preparation for Amazon Redshift and S3 on AWS Marketplace (AMI)
Through the Amazon Marketplace, you can license and deploy an AMI of Trifacta Data Preparation for Amazon Redshift and S3, which does not require integration with a clustered running environment. All job execution happens within the AMI on the EC2 instance that you deploy. For more information, see the Trifacta Data Preparation for Amazon Redshift and S3 listing for AWS Marketplace.
- For install and configuration instructions, see Install from AWS Marketplace.
Trifacta Wrangler Enterprise on AWS Marketplace with EMR
You can deploy an AMI of the Trifacta platform onto an EC2 instance. For more information, see the Trifacta Wrangler Enterprise listing for AWS Marketplace.
You can deploy it in either of the following ways:
- Auto-create a 3-node EMR cluster. For more information on installation, see Install from AWS Marketplace with EMR.
- Integrate it later with your pre-existing EMR cluster.
For more information on base AWS configuration, see Configure for AWS.
- For more information on configuring integration with EMR, see Configure for EMR.
Trifacta Wrangler Enterprise on EC2 Instance
When the Trifacta platform is installed on AWS, it is deployed on an EC2 instance. Through the EC2 console, there are a few key parameters that must be specified.
NOTE: After you have created the instance, you should retain the instanceId from the console, which must be applied to the configuration in the Trifacta platform.
For more information, see Install.
For more information on base AWS configuration, see Configure for AWS.
For more information on configuring EC2, see Configure for EC2 Role-Based Authentication.
AWS Integrations
The following table describes the different AWS components that can host or integrate with the Trifacta platform. Combinations of one or more of these items constitute one of the deployment scenarios listed in the following section.
AWS Service | Description | Base Storage Layer | Other Required AWS Services |
---|---|---|---|
EC2 | Amazon Elastic Compute Cloud (EC2) can be used to host the Trifacta node in a scalable cloud-based environment. The following deployments are supported:
| Base storage layer can be S3 or HDFS. If set to HDFS, only read access to S3 is permitted. | |
S3 | Amazon Simple Storage Service (S3) can be used for reading data sources, writing job results, and hosting the Trifacta databases. | Base storage layer can be S3 or HDFS. If set to HDFS, only read access to S3 is permitted. | |
Redshift | Amazon Redshift provides a scalable data warehouse platform, designed for big data analytics applications. The Trifacta platform can be configured to read and write from Amazon Redshift database tables. | Base Storage Layer = S3 | S3 |
EMR | For more information on supported versions of EMR, see Configure for EMR. | Base Storage Layer = S3 | EC2 instance |
Amazon RDS | Optionally, the Trifacta databases can be installed on Amazon RDS. For more information, see Install Databases on Amazon RDS. | Base Storage Layer = S3 |
AWS Marketplace integrations:
AWS Service | Description | Base Storage Layer | Other Required AWS Services |
---|---|---|---|
AMI | Through the AWS Marketplace, you can license and install an Amazon Machine Image (AMI) instance of Trifacta Data Preparation for Amazon Redshift and S3. This product is intended for smaller user groups that do not need large-scale processing of Hadoop-based clusters. | Base Storage Layer = S3 NOTE: HDFS is not supported. | EC2 instance |
EMR | Through the AWS Marketplace, you can license and install an AMI specifically configured to work with Amazon Elastic Map Reduce (EMR), a Hadoop-based data processing platform. | Base Storage Layer = S3 | AMI |
This page has no comments.