This install process applies to installing on an AWS infrastructure that you manage.
AWS Marketplace deployments:
NOTE: Content in this section does not apply to deployments from the AWS Marketplace, which provide fewer deployment and configuration options. For more information, see the AWS Marketplace. |
NOTE: All hardware in use for supporting the platform is maintained within the enterprise infrastructure on AWS. |
NOTE: When the above installation and configuration steps have been completed, the platform is operational. Additional configuration may be required, which is referenced at the end of this section. |
For more information on deployment scenarios, see Supported Deployment Scenarios for AWS.
The following limitations apply to installations of on AWS:
append
publishing action.Depending on which of the following AWS components you are deploying, additional pre-requisites and limitations may apply. Please review these sections as well.
Before you begin, please verify that you have completed the following:
Read: Please read this entire document before you create the EMR cluster or install the .
S3: Enable and deploy an AWS S3 bucket to use as the base storage layer for the platform. In the bucket, the platform stores metadata in the following location:
<S3_bucket_name>/trifacta |
The system account or individual user accounts must have full permissions for the S3 bucket:
Delete*, Get*, List*, Put*, Replicate*, Restore* |
These policies must apply to the bucket and its contents. Example:
"arn:aws:s3:::my-trifacta-bucket-name" "arn:aws:s3:::my-trifacta-bucket-name/*" |
The required set of ports must be enabled for listening. See System Ports.
This node should be dedicated for .
NOTE: The EC2 node must meet the system requirements. For more information, see System Requirements. |
Cluster sizing: Before you begin, you should allocate sufficient resources for sizing the cluster. For guidance, please contact your .
Before you begin installation, please acquire the following information from AWS:
Path to resources on the S3 bucket
In your AWS infrastructure, you must deploy a supported version of EMR across a recommended number of nodes to support the expected data volumes of your .
For more information on the supported EMR distributions, see Supported Deployment Scenarios for AWS.
When you configure the platform to integrate with the cluster, you must acquire some information about the cluster resources. For more information on the set of information to collect, see Pre-Install Checklist in the Install Preparation area.
An EC2 node of the cluster must be deployed to host the software. For more information on the requirements of this node, see System Requirements.
When you configure the platform to integrate with the cluster, you must acquire some information about the cluster resources. For more information on the set of information to collect, see Pre-Install Checklist in the Install Preparation area.
Here are some guidelines for deploying the EC2 cluster from the EC2 cluster:
Local storage: Select a local EBS volume. The default volume includes 100GB storage.
NOTE: The local storage environment contains the |
Save your changes.
NOTE: These steps are covered in greater detail later in this section. |
After you have completed, the above, please complete these steps listed in order:
|
NOTE: If you are creating a new EMR cluster as part of this installation process, please skip this section. That workflow is covered later in the document. For more information, see Configure for EMR. |
Please complete the following configuration to enable access to your pre-existing EMR cluster from the .
You must make changes to your IAM and Security Group changes to enable the to communicate with your existing EMR cluster and your EMR cluster to read/write to the
. Below are the requirements and suggested implementation details. Please adapt these suggestions to fit your environment as long as the requirements are satisfied.
For additional documentation around these changes:
Requirement | Example | ||
---|---|---|---|
|
| ||
EMR EC2 instance role must be permitted to use the |
| ||
Your EMR Service Role should permit access to the
|
| ||
Your EMR cluster master node must permit the |
|
Additional configuration must be applied within the platform. These steps are described later.
Steps:
Acquire the license.json
license key file that was provided to you by your .
Transfer the license key file to the EC2 node that is hosting the . Navigate to the directory where you stored it.
Make the the owner of the file:
sudo chown trifacta:trifacta license.json |
Make sure that the has read permissions on the file:
sudo chmod 644 license.json |
Copy the license key file to the proper location:
cp license.json /opt/trifacta/license/ |
For more information on how to launch the platform, see Start and Stop the Platform.
When the instance is spinning up for the first time, performance may be slow. When the instance is up, navigate to the following:
http://<public_hostname>:3005 |
When the login screen appears, enter the default admin credentials provided to you.
NOTE: As soon as you login as an admin for the first time, you should immediately change the password. From the left nav bar, select Settings > Settings > User Profile. Change the password and click Save to restart the platform. |
The following steps apply to configure the platform to integrate with the EMR cluster:
In the Search bar, enter the following:
aws.s3.bucket.name |
Set the value of this setting to be S3 bucket name.
Check the following setting. Verify that it is set to 2.3.0
:
"spark.version": "2.3.0", |
The following setting must be specified.
"aws.mode":"system", |
You can set the above value to either of the following:
aws.mode value | Description |
---|---|
system | Set the mode to system to enable use of EC2 instance-based authentication for access. |
user | Set the mode to user to utilize user-based credentials. This mode requires additional configuration. |
Details on the above configuration are described later.
Set the following parameter to true
, which instructs the to run jobs on the integrated EMR cluster:
"webapp.runinEMR" = true, |
In the Admin Settings page, locate the External Service Settings section.
In the Admin Settings page, locate the External Service Settings section.
AWS EMR Cluster ID: Paste the value for the EMR Cluster ID for the cluster to which the platform is connecting.
EMRLOGS
.Click Save underneath the External Service Settings section.
The platform requires that one backend datastore be configured as the base storage layer. This base storage layer is used for storing uploaded data and writing results and profiles.
NOTE: By default, the base storage layer for |
NOTE: You can try to verify operations using the |
Tip: You should access online documentation through the product. Online content may receive updates that are not present in PDF content. |
You can access complete product documentation online and in PDF format. From within the , select Help menu > Product Docs.
After you have accessed the documentation, the following topics are relevant to AWS enterprise infrastructure deployments.
NOTE: These materials are located in the Configuration Guide. |
Please review them in order.
Topic | Description |
---|---|
Required Platform Configuration | This section covers the following topics, some of which should already be completed:
|
Set up for a new EMR cluster. Some content may apply to existing EMR clusters. | |
Enable Integration with Compressed Clusters | If the Hadoop cluster uses compression, additional configuration is required. |
Enable Integration with Cluster High Availability | If you are integrating with high availability on the Hadoop cluster, please complete these steps.
|
Enable Relational Connections | Enable integration with relational databases, including Redshift.
|
Configure for KMS | Integration with the Hadoop cluster's key management system (KMS) for encrypted transport. Instructions are provided for distribution-specific versions of Hadoop. |
Configure Security | A list of topics on applying additional security measures to the |
Configure SSO for AD-LDAP | Please complete these steps if you are integrating with your enterprise's AD/LDAP Single Sign-On (SSO) system. |
For more information on upgrading your on AWS, please contact
.
Related Topics |