Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

d-toc

D s install marketplace

Scenario Description

This scenario assumes the following about the 

D s platform
rtrue
 deployment:

  • The platform is to be deployed using the official
    D s product
    productee
    AMI in one of the following ways:
    • Via a CloudFormation template
    • Manually from the EC2 Launch Instance wizard
  • It is to be connected to an EMR cluster. Depending on how you install the product, this cluster can be:
    • Created automatically as a three-node cluster via the CloudFormation
    • Pre-existing in your enterprise infrastructure
  • No security features are applied to the platform and its use of the datastore.
  • You have acquired a 
    D s item
    itemlicense key
    . The license key must be deployed to the  
    D s item
    itemnode
     before you start the platform.
Info

NOTE: This scenario does not provide information on installing and configuring optional components, including security features. It is intended to get the

D s platform
installed, operational, and connected to the EMR cluster.

Install and Upgrade Methods

You can install the software using either of the following methods:

  1. CloudFormation Template: This method of installation utilizes an Amazon CloudFormation template to install and configure a complete working system that includes:
    1. EC2 instance with supporting policies/roles.
    2. D s item
      itemnode
      and software
    3. S3 bucket with supporting policies
    4. 3-node EMR cluster with supporting policies/roles
  2. Manual installation: This method allows you finer-grain control and the ability to deploy the platform into a pre-existing AWS environment (VPC/EMR/S3 bucket).

D s product
productee
on AWS

Through EC2 with AMI IDThrough CloudFormation template
Install

If you know the AMI ID for

D s product
productee
, you can install the product through EC2.

Tip

Tip: Use this method if you are integrating the product with an existing EMR cluster, need to launch it in an existing VPC, or want to customize your EMR cluster.

Info

NOTE: Please verify that the additional pre-requisites have been met. See below.

 Supported. Instructions are provided below.

Tip

Tip: Using the CloudFormation template is the recommended method of deploying the product if you do not need to integrate it into an existing environment.

UpgradeSupported. See Upgrade for AWS Marketplace with EMR.
Info

NOTE: This method of upgrading the product is not recommended because it requires replacement of the platform EC2 instance. Instead, you should bring up a new instance and test it next to the existing instance before completely switching over to the new instance.

Pre-requisites

Warning

If you are integrating the

D s platform
with an EMR cluster, you must acquire a license first. Additional configuration is required. For more information, please contact aws-marketplace@trifacta.com.

Before you begin:

  1. Read: Please read this entire document before you begin.

  2. EULA. Before you begin, please review the End-User License Agreement. See https://docs.trifacta.com/display/PUB/End-User+License+Agreement+-+Trifacta+Wrangler+Enterprise.

  3. D s item
    itemlicense file
    If you have not done so already, please acquire a 
    D s item
    itemlicense file
     from your 
    D s item
    itemrepresentative
    .

Internet access

D excerpt include
pageConfigure for AWS
nopaneltrue

SELinux

By default, 

D s product
productee
 is installed on a server with SELinux enabled. Security-enhanced Linux (SELinux) provides a set of security features for, among other things, managing access controls. 

Tip

Tip: The following may be applied to other deployments of the

D s platform
on servers where SELinux has been enabled.

In some cases, SELinux can interfere with normal operations of platform software. If you are experiencing connectivity problems related to SELinux, you can do either one of the following:

  1. Disable SELinux on the server. For more information, please see the CentOS documentation.
  2. Apply the following commands on the server, as root:
    1. Open ports on the server for listening. 
      1. By default, the 

        D s webapp
         listens on port 3005. The following opens that port when SELinux is enabled:

        Code Block
        semanage port -a -t http_port_t -p tcp 3005
      2. Repeat the above step for any other ports that you wish to open on the server.
    2. Permit nginx, the proxy on the 

      D s item
      itemnode
      , to open websockets:

      Code Block
      setsebool -P httpd_can_network_connect 1

Product Limitations

  • The EC2 instance, S3 buckets, and any connected Redshift databases must be located in the same Amazon region. Cross-region integrations are not supported at this time.
  • No support for Hive integration
  • No support for secure impersonation or Kerberos
  • No support for high availability and failover
  • Job cancellation is not supported on EMR.
  • When publishing single files to S3, you cannot apply an append publishing action.

Install

Excerpt

Desktop Requirements

  • All desktop users of the platform must have the latest version of Google Chrome installed on their desktops.
    • Google Chrome must have the PNaCl client installed and enabled.
    • PNaCl Version:  0.50.x.y or later
  • All desktop users must be able to connect to the EC2 instance through the enterprise infrastructure.

Pre-requisites for EC2 Installations

Info

NOTE: Before you install, you should review the configuration content for specific instructions on setting up the

D s item
itemnode
. See below.

If you are manually installing the platform through the Marketplace, please verify that you have the following assets:

  1. EC2 instance:

    1. Create the EC2 instance for the 

      D s platform
      .

    2. Download and deploy the AMI into the EC2 instance.

  2. EMR Cluster: 

    1. Before you begin, you should allocate sufficient resources for sizing the EMR cluster. For guidance, please contact your 
      D s item
      itemrepresentative
      .
    2. If you have a pre-existing EMR cluster, please verify that the cluster is working properly.

  3. S3 bucket. Please create an S3 bucket to store 

    d-s

...

  1. -item
    itemassets
    . In the bucket, the platform stores metadata in the following location:

    Code Block
    <S3_bucket_name>/trifacta

    See https://s3.console.aws.amazon.com/s3/home.

  2. IAM policies. Create IAM policies for access to the S3 bucket. Required permissions are the following: 
    • The system account or individual user accounts must have full permissions for the S3 bucket:

      Code Block
      Delete*, Get*, List*, Put*, Replicate*, Restore*
    • These policies must apply to the bucket and its contents. Example:

      Code Block
      "arn:aws:s3:::my-trifacta-bucket-name"
      "arn:aws:s3:::my-trifacta-bucket-name/*"
    • See https://console.aws.amazon.com/iam/home#/policies
  3. EC2 instance role. Create an EC2 instance role for this policy. See https://console.aws.amazon.com/iam/home#/roles.

Install Steps - CloudFormation Template

Info

NOTE: If you are integrating the

D s platform
with a pre-existing cluster, you must meet the EC2 install pre-requisites above and complete the manual install process listed in the section following this one. Please skip this section.

This install process creates the following:

  • D s item
    itemnode
    on an EC2 instance
    • Associated policies and roles
  • S3 bucket
    • Associated policies and roles
  • Three-node EMR cluster
    • Associated policies and roles

Steps:

 

  1. In the Marketplace listing, click Deploy into a new VPC.
  2. Choose a Template: The template path is automatically populated for you.
  3. Specify Details:
    1. Stack Name: Display name of the stack is used in the names of resources created by the stack and as an identifier for the stack.

      Info

      NOTE: Each instance of the

      D s platform
      must have a separate name.

    2. Instance Type: Please select the appropriate instance depending on the number of users and data volumes of your environment. For more information, see the Sizing Guide above.

    3. Key Pair: This SSH key pair is used to access the

      D s item
      itemInstance
      and the EMR cluster instances.

    4. Allowed HTTP Source: This range of addresses are permitted access to the

      D s item
      itemInstance
      on port 80, 443, and 3005.

      1. Port numbers 80 and 443 do not have any services by default, but you may modify the

        D s item
        itemconfiguration
        to permit access via these ports.

    5. Allowed SSH Source: This range of addresses is permitted access to port 22 on the

      D s item
      itemInstance
      .

  4. Options: None of these is required for installation. Specify any options as needed for your environment.
  5. Review: Review your installation and configured options.
    1. Select the checkbox at the end of the page.
    2. To launch the stack, click Create.
  6. Please wait while the stack creates all required resources.
  7. In the Stacks list, select the name of your application. Click the Outputs tab and collect the following information. Instructions on how to use this information are provided later.

    ParameterDescriptionUse

    D s item
    itemURL value

    URL and port number to which to connect to the

    D s item
    itemapplication

    Users must connect to this IP address and port number to access. By default, it is set to 3005. The access port can be moved to 80 or 443 if desired. Please contact us for more details.

    D s item
    itemBucket

    The address of the default S3 bucketThis value must be applied through the application after it has been deployed.

    D s item
    itemInstance Id

    The identifier for the instance of the platform

    This value is the default password for the admin account.

    Info

    NOTE: You must change this password on the first login to the application.

  8. After the

    D s item
    iteminstance
    has been created, you must add a license file before starting the
    D s item
    itemservice
    . Here we will SSH into the server and create the license file and paste the license file content in, plus update the ownership and permissions of that file:

    1. SSH into the server as the centos user and using the key you specified.

    2. Change to root user:

      Code Block
      sudo su



    3. Update the license file on the

      D s item
      itemnode
      . Edit the following file:

      Code Block
      /opt/trifacta/license/license.json
    4. Into the above file, paste the contents of the license.json file that was provided to you by your

      D s item
      itemrepresentative
      .

    5. Verify permissions on the file:

      Code Block
      chown trifacta:trifacta /opt/trifacta/license/license.json
      chmod 644 /opt/trifacta/license/license.json
  9. Start the

    D s item
    itemservice
    :

    Code Block
    service trifacta start
  10. It may take some time for the server to finish coming online. Navigate to the
    D s webapp
    .
  11. When the login screen appears, enter the following:
    1. Username: admin@trifacta.local
    2. Password: (the TrifactaInstanceId value)

      Info

      NOTE: After you login as an admin for the first time, you must change the password.

  12. From the application menu, select the Settings menu. Then, click Settings >Admin Settings
  13. In the Admin Settings page, you can configure many aspects of the platform, including user management tasks, and perform restarts to apply the changes.

  14. Add the S3 bucket that was automatically created to store

    D s item
    itemmetadata
    and EMR content. Search for:

    Code Block
    "aws.s3.bucket.name"

     

    1. Update the value with the
      D s item
      itemBucket value
      provided when you created the stack in AWS.
  15. Verify your Spark version. If the cluster was launched from AWS, this value should be set to 2.3.0. Search for:

    Code Block
    "spark.version"

     

    1. Update its value to 2.3.0, if necessary.

  16. Set up the AWS authentication mode:

    1. Search for:

      Code Block
      "aws.mode"
    2. System mode is default and is recommended unless you are doing a custom installation.

      aws.mode valueDescription
      systemSet the mode to system to enable use of EC2 instance-based authentication for access.
      userSet the mode to user to utilize user-based credentials. This mode requires additional configuration.

      Details on the above configuration are described later.

  17. Enable the "Run in EMR" option within the platform. Search for:

    Code Block
    "webapp.runinEMR"

     

    1. Select the checkbox to enable it.
  18. Click Save underneath the Platform Settings section.

  19. In the Admin Settings page, locate the External Service Settings section.

    1. AWS EMR Cluster ID: Paste the value for the EMR Cluster ID for the cluster to which the platform is connecting.

    2. AWS Region: Enter the region where your EMR cluster is located.
    3. Resource Bucket: you may use the already created
      D s item
      itemBucket
      .
    4. Resource Path: you should use something like EMRLOGS.
  20. Click Save underneath the External Service Settings section.

  21. When the platform restarts, you can begin using the product.

Deleting CloudFormation stack

Warning

If you must delete the CloudFormation stack, please be aware of the following.

  1. The S3 bucket that was created for the stack is not removed. If you want to delete it, you must empty it first and then delete it.
  2. Any EMR security groups created for the stack cannot be deleted, due to circular references. The stack deletion process informs you of the security groups that it failed to delete. To complete the deletion:
    1. Remove all rules from the security groups.
    2. Delete the security groups manually.
    3. Re-run the stack deletion, which should complete successfully.

Install Steps - EC2 Instance

  1. Launch the product.
  2. In the EC2 Console:
    1. Instance size: Select the instance size.
    2. Network: Configure the VPC, subnet, firewall and other configuration settings necessary to communicate with the instance. 
    3. Auto-assigned Public IP: You must create a public IP to access the 
      D s platform
      .
    4. EC2 role: Select the EC2 role that you created.
    5. Local storage: Select a local EBS volume. The default volume includes 100GB storage.

      Info

      NOTE: The local storage environment contains the

      D s item
      itemdatabases
      , the product installation, and its log files. No source data is ever stored within the product.

    6. Security group: Use a security group that exposes access to port 3005, which is the default port for the platform. 
    7. Create an AWS key-pair for access: This key is used to provide SSH access to the platform, which may be required for some admin tasks.
    8. Save your changes.
  3. Apply license key:

    1. Acquire the license.json license key file that was provided to you by your

      D s item
      itemrepresentative
      .

    2. Transfer the license key file to the EC2 node that is hosting the

      D s platform
      . Navigate to the directory where you stored it.

    3. Make the

      D s item
      itemuser
      the owner of the file:

      Code Block
      sudo chown trifacta:trifacta license.json
    4. Make sure that the

      D s item
      itemuser
       has read permissions on the file:

      Code Block
      sudo chmod 644 license.json
    5. Copy the license key file to the proper location:

      Code Block
      cp license.json /opt/trifacta/license/
  4. Launch the configured platform.

    Info

    NOTE: From the EC2 Console, please acquire the instanceId, which is needed in a later step.

  5. When the instance is spinning up for the first time, performance may be slow. When the instance is up, navigate to the following:

    Code Block
    http://<public_hostname>:3005
  6. When the login screen appears, enter the following:
    1. Username: admin@trifacta.local
    2. Password: (the instanceId value)

      Info

      NOTE: As soon as you login as an admin for the first time, you should immediately change the password. Select the User Profile menu item in the upper-right corner. Change the password and click Save to restart the platform.

  7. From the application menu, select Settings menu > Admin Settings
  8. In the Admin Settings page, you can configure many aspects of the platform, including user management tasks, and perform restarts to apply the changes.
  9. In the Search bar, enter the following:

    Code Block
    "aws.s3.bucket.name"

     

    1. Set the value of this setting to be the bucket that you created.

  10. The following setting must be specified.

    Code Block
    "aws.mode":"system",

    You can set the above value to either of the following:

    aws.mode valueDescription
    systemSet the mode to system to enable use of EC2 instance-based authentication for access.
    userSet the mode to user to utilize user-based credentials to access the EMR cluster.

    Details on the above configuration are described later.

  11. Click Save.

  12. When the platform restarts, you can begin using the product.

SSH Access

If you need to SSH to the

D s item
itemnode
, you can use the following command:

Code Block
ssh -i <path_to_key_file> <userId>@<tri_node_DNS_or_IP>
ParameterDescription
<path_to_key_file>

Path to the key file stored on your local computer.

<userId>The user ID is always centos.
<tri_node_DNS_or_IP>

DNS or IP address of the

D s item
itemnode

Additional Configuration for Manual Installs

  1. EC2 Role-Based Authentication: If aws.mode was set to system, you are using EC2 role-based authentication. Additional configuration is required to integrate the roles into the platform. For more information, see Configure for EC2 Role-Based Authentication.
  2. EMR: If you are integrating with a pre-existing EMR cluster, additional configuration is required.

    Info

    NOTE: Please review these steps with your

    D s item
    itemrepresentative
    .

    For more information, see Configure for EMR.

  3. S3: For more information on how to modify your S3 integration, see Enable S3 Access.

Verify

Start and Stop the Platform

D excerpt include
pageInstall Start Platform
nopaneltrue

Verify Operations

D excerpt include
pageInstall Verify
nopaneltrue

Upgrade

For more information, see Upgrade for AWS Marketplace with EMR.

Documentation

You can access complete product documentation in online and PDF format. After the platform has been installed, select Help menu > Product Docs from the menu in the 

D s webapp
.

Noprint

Related Topics

D s also
labelamazon