D toc |
---|
This install process applies to installing
D s product | ||||
---|---|---|---|---|
|
AWS Marketplace deployments:
Info |
---|
NOTE: Content in this section does not apply to deployments from the AWS Marketplace, which provide fewer deployment and configuration options. For more information, see the AWS Marketplace. |
Excerpt | |||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Scenario Description
For more information on deployment scenarios, see Supported Deployment Scenarios for AWS. Product LimitationsThe following limitations apply to installations of
Pre-requisitesDesktop Requirements
AWS Pre-requisitesDepending on which of the following AWS components you are deploying, additional pre-requisites and limitations may apply. Please review these sections as well. PrepBefore you begin, please verify that you have completed the following:
AWS InformationBefore you begin installation, please acquire the following information from AWS:
Internet access
|
Deploy the Cluster
In your AWS infrastructure, you must deploy a supported version of EMR across a recommended number of nodes to support the expected data volumes of your
D s item | ||
---|---|---|
|
- For more information on suggested sizing, see Sizing Guidelines in the Install Preparation area.
For more information on the supported EMR distributions, see Supported Deployment Scenarios for AWS.
When you configure the platform to integrate with the cluster, you must acquire some information about the cluster resources. For more information on the set of information to collect, see Pre-Install Checklist in the Install Preparation area.
Deploy the EC2 Node
An EC2 node of the cluster must be deployed to host the
D s platform |
---|
When you configure the platform to integrate with the cluster, you must acquire some information about the cluster resources. For more information on the set of information to collect, see Pre-Install Checklist in the Install Preparation area.
Here are some guidelines for deploying the EC2 cluster from the EC2 cluster:
- Instance size: Select the instance size.
- Network: Configure the VPC, subnet, firewall and other configuration settings necessary to communicate with the instance.
- Auto-assigned Public IP: You must create a public IP to access the
.D s platform - EC2 role: Select the EC2 role that you created.
Local storage: Select a local EBS volume. The default volume includes 100GB storage.
Info NOTE: The local storage environment contains the
, the product installation, and its log files. No source data is ever stored within the product.D s item item databases - Security group: Use a security group that exposes access to port 3005, which is the default port for the platform.
- Create an AWS key-pair for access: This key is used to provide SSH access to the platform, which may be required for some admin tasks.
Save your changes.
Install Workflow
Info |
---|
NOTE: These steps are covered in greater detail later in this section. |
Excerpt | |||
---|---|---|---|
After you have completed, the above, please complete these steps listed in order: 1 - Install softwareInstall the
2 - Install databasesThe platform requires several databases for storing metadata. NOTE: The software assumes that you are installing the databases on a PostgreSQL server on the same node as the software. If you are not or are changing database names or ports, additional configuration is required as part of this installation process. For more information, see Install Databases in the Databases Guide. 3 - Login to the applicationAfter software and databases are installed, you can login to the application to complete configuration. See Login. As soon as you login, you should change the password on the admin account. In the left menu bar, select Settings > Admin Settings. Scroll down to Manage Users. For more information, see Change Admin Password.
|
Configure for EMR
Info |
---|
NOTE: If you are creating a new EMR cluster as part of this installation process, please skip this section. That workflow is covered later in the document. For more information, see Configure for EMR. |
Please complete the following configuration to enable access to your pre-existing EMR cluster from the
D s platform |
---|
IAM and Security Group updates
You must make changes to your IAM and Security Group changes to enable the
D s item | ||
---|---|---|
|
D s item | ||
---|---|---|
|
For additional documentation around these changes:
- https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-roles.html
- https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-additional-sec-groups.html
Requirement | Example | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||
EMR EC2 instance role must be permitted to use the
|
| ||||||||||||
Your EMR Service Role should permit access to the
|
| ||||||||||||
Your EMR cluster master node must permit the
|
|
Additional configuration must be applied within the platform. These steps are described later.
Additional Required Configuration for AWS Installs
Apply license key to EC2 node
Steps:
Acquire the
license.json
license key file that was provided to you by your
.D s item item representative Transfer the license key file to the EC2 node that is hosting the
. Navigate to the directory where you stored it.D s platform Make the
the owner of the file:D s item item user Code Block sudo chown trifacta:trifacta license.json
Make sure that the
has read permissions on the file:D s item item user Code Block sudo chmod 644 license.json
Copy the license key file to the proper location:
Code Block cp license.json /opt/trifacta/license/
Start the platform
For more information on how to launch the platform, see Start and Stop the Platform.
When the instance is spinning up for the first time, performance may be slow. When the instance is up, navigate to the following:
Code Block |
---|
http://<public_hostname>:3005 |
When the login screen appears, enter the default admin credentials provided to you.
Info |
---|
NOTE: As soon as you login as an admin for the first time, you should immediately change the password. From the left nav bar, select Settings > Settings > User Profile. Change the password and click Save to restart the platform. |
Configure for EMR clusters
The following steps apply to configure the platform to integrate with the EMR cluster:
- From the application menu, select the Settings menu. Then, click Settings > Admin Settings.
- In the Admin Settings page, you can configure many aspects of the platform, including user management tasks, and perform restarts to apply the changes.
In the Search bar, enter the following:
Code Block aws.s3.bucket.name
Set the value of this setting to be S3 bucket name.
Check the following setting. Verify that it is set to
2.3.0
:Code Block "spark.version": "2.3.0",
The following setting must be specified.
Code Block "aws.mode":"system",
You can set the above value to either of the following:
aws.mode value Description system
Set the mode to system
to enable use of EC2 instance-based authentication for access.user
Set the mode to user
to utilize user-based credentials. This mode requires additional configuration.Details on the above configuration are described later.
Set the following parameter to
true
, which instructs the
to run jobs on the integrated EMR cluster:D s webapp Code Block "webapp.runinEMR" = true,
In the Admin Settings page, locate the External Service Settings section.
In the Admin Settings page, locate the External Service Settings section.
AWS EMR Cluster ID: Paste the value for the EMR Cluster ID for the cluster to which the platform is connecting.
- AWS Region: Enter the region where your EMR cluster is located.
- Resource Bucket: Enter the name of the S3 bucket to use.
- Resource Path: you should use something like
EMRLOGS
.
Click Save underneath the External Service Settings section.
Set base storage layer
The platform requires that one backend datastore be configured as the base storage layer. This base storage layer is used for storing uploaded data and writing results and profiles.
Info | ||||
---|---|---|---|---|
NOTE: By default, the base storage layer for
|
Verify Operations
Info | |
---|---|
NOTE: You can try to verify operations using the
|
D excerpt include | ||||
---|---|---|---|---|
|
Documentation
Tip |
---|
Tip: You should access online documentation through the product. Online content may receive updates that are not present in PDF content. |
You can access complete product documentation online and in PDF format. From within the
D s webapp |
---|
Next Steps
After you have accessed the documentation, the following topics are relevant to AWS enterprise infrastructure deployments.
Info |
---|
NOTE: These materials are located in the Configuration Guide. |
Please review them in order.
Topic | Description | |
---|---|---|
Required Platform Configuration | This section covers the following topics, some of which should already be completed:
| |
Set up for a new EMR cluster. Some content may apply to existing EMR clusters. | ||
Enable Integration with Compressed Clusters | If the Hadoop cluster uses compression, additional configuration is required. | |
Enable Integration with Cluster High Availability | If you are integrating with high availability on the Hadoop cluster, please complete these steps.
| |
Enable Relational Connections | Enable integration with relational databases, including Redshift.
| |
Configure for KMS | Integration with the Hadoop cluster's key management system (KMS) for encrypted transport. Instructions are provided for distribution-specific versions of Hadoop. | |
Configure Security | A list of topics on applying additional security measures to the
| |
Configure SSO for AD-LDAP | Please complete these steps if you are integrating with your enterprise's AD/LDAP Single Sign-On (SSO) system. |
Upgrade
For more information on upgrading your
D s product | ||
---|---|---|
|
D s proserv |
---|
Noprint | ||||
---|---|---|---|---|
Related Topics
|