Page tree

 

Contents:


This guide takes you through the steps for installing  Trifacta® Wrangler Enterprise software on Ubuntu. 

For more information on supported operating system versions, see Product Support Matrix in the Planning Guide.

Preparation

Before you begin, please complete the following.

NOTE: Except for database installation and configuration, all install commands should be run as the root user or a user with similar privileges. For database installation, you will be asked to switch the database user account.

 

Steps:

  1. Review key sections of the Planning Guide:
    1. Review the System Requirements and verify that all required components have been installed.
    2. Verify that all required System Ports are opened on the node.

    3. Review the System Dependencies in the Planning Guide.

    4. Cluster configuration: Additional steps are required to integrate the Trifacta platform with the cluster. See Prepare Hadoop for Integration with the Platform in the Planning Guide.

  2. Acquire your License Key.
  3. Install and verify operations of the datastore, if used.

    NOTE: Access to the cluster may be required.

  4. Verify access to the server where the Trifacta platform is to be installed.

Installation

Tip: The Python setup tools can be useful for debugging startup issues. To install:

wget https://bootstrap.pypa.io/ez_setup.py -O - | python

1. Install Dependencies

Without Internet access

If you have not done so already, you may download the dependency bundle with your release directly from  Trifacta. For more information, see Install Dependencies without Internet Access.

With Internet access

Use the following to add the hosted package repository for Ubuntu, which will automatically install the proper packages for your environment. 

NOTE: Install curl if not present on your system.

Then, execute the following command: 

NOTE: Run the following command as the root user. In proxied environments, the script may encounter issues with detecting proxy settings.

curl https://packagecloud.io/install/repositories/trifacta/dependencies/script.deb.sh | sudo bash


Special instructions for Ubuntu installs

These steps manually install the correct and supported version of the following:

  • nodeJS
  • nginX
  • Supervisord

Due to a known issue resolving package dependencies on Ubuntu, please complete the following steps prior to installation of other dependencies or software. 

  1. Login to the Trifacta node as an administrator.

  2. Execute the following command to install  nodeJS, nginX, and Supervisor:

    1. Ubuntu 16.04 (Xenial):

      sudo apt-get install supervisor=3.2.4 nginx=1.12.2-1~xenial nodejs=12.16.1-1nodesource1
    2. Ubuntu 18.04 (Bionic Beaver):

      sudo apt-get install supervisor=3.2.4 nginx=1.14.2-1~bionic nodejs=12.16.1-1nodesource1
  3. Continue with the installation process.

2. Install JDK

By default, the  Trifacta node uses OpenJDK for accessing Java libraries and components. In some environments, basic setup of the node may include installation of a JDK. Please review your environment to verify that an appropriate JDK version has been installed on the node.

NOTE: Use of Java Development Kits other than OpenJDK is not currently supported. However, the platform may work with the Java Development Kit of your choice, as long as it is compatible with the supported version(s) of Java. For more information, see System Requirements in the Planning Guide.

Tip: OpenJDK is included in the offline dependencies, which can be used to install the platform without Internet access. For more information, see Install Dependencies without Internet Access.

The following commands can be used to install OpenJDK. These commands can be modified to install a separate compatible version of the JDK.

sudo apt-get install openjdk-8-jre-headless

JAVA_HOME:

By default, the JAVA_HOME environment variable is configured to point to a default install location for the OpenJDK package. 

NOTE: If you have installed a JDK other than the OpenJDK version provided with the software, you must set the JAVA_HOME environment variable on the Trifacta node to point to the correct install location. 

The property value must be updated in the following locations:

  1. Edit the following file: /opt/trifacta/conf/env.sh

  2. Save changes.

3. Install Trifacta package

NOTE: If you are installing without Internet access, you must reference the local repository. The command to execute the installer is slightly different. See Install Dependencies without Internet Access.

NOTE: Installing the Trifacta platform in a directory other than the default one is not supported or recommended.

Install the package with apt, using root:

NOTE: If you encounter errors running the following command, execute the next command anyway. If that command completes without error, the installation is ok.


sudo dpkg -i <deb file>

The previous line may return an error message, which you may ignore. Continue with the following command: 

sudo apt-get -f -y install


4. Verify Install

The product is installed in the following directory: 

/opt/trifacta

JAVA_HOME:


The platform must be made aware of the location of Java.

Steps:

  1. Edit the following file:  /opt/trifacta/conf/trifacta-conf.json
  2. Update the following parameter value:

    "env": {
      "JAVA_HOME": "/usr/lib/jvm/java-1.8.0-openjdk.x86_64"
    },
  3. Save changes.

5. Install License Key

Please install the license key provided to you by Trifacta. See License Key.

6. Install Hadoop dependencies

If you are integrating with a supported Hadoop cluster, you must install the dependencies for the Hadoop cluster on the Trifacta node. See below.

7. Set File Ownership

All files in the Trifacta install directory and sub-directories must be owned by the same user that is used to run the Trifacta platform. Mismatches in ownership and execution permissions can cause services to fail to start.

Steps:

Before you upgrade, please complete the following:

  1. Login to the Trifacta node as the root user.
  2. Execute the following command. The user that is being granted ownership of the install directory is trifacta, which is the default user that runs the platform. If you are using a different user to run your Trifacta deployment, please substitute that name.

    chown -R trifacta:trifacta /opt/trifacta

8. Store install packages

For safekeeping, you should retain all install packages that have been installed with this Trifacta deployment.

Install Hadoop Dependencies

If you are integrating Hadoop cluster, the associated Hadoop dependencies must be installed on the Trifacta® node

Included Dependencies

The Hadoop dependencies for the latest supported version of each Hadoop distribution are included in the Trifacta software distribution.

Supported Versions:


Not required for:

NOTE: If you are integrating with one of the following running environments, please skip installing Hadoop dependencies.

Azure running environments:

  • HDI
  • Azure Databricks

Acquire Other Dependencies

Hadoop dependencies for other versions of the Hadoop distribution can be acquired from the Trifacta FTP site using one of the following methods.

Via a web browser

  1. Log in: https://ftp.trifacta.com/login
  2. Browse to the following directory:

    Releases/Trifacta_x.y/hadoop/

    where: x.y corresponds to the release number that you are installing (e.g. Release 6.8).

  3. Download the following file: hadoop_deps.tar.gz

Via WGET

Example is for Release 6.8:

wget --user CustomerUsername --ask-password  ftps://ftp.trifacta.com/Releases/Trifacta_6.8/hadoop/hadoop-deps.tar.gz

Via SFTP

Example is for Release 6.8:

sftp CustomerUsername@ftp.trifacta.com:Releases/Trifacta_6.8/hadoop/hadoop-deps.tar.gz .

Via CURL

Example is for Release 6.8:

curl -O -C - -u CustomerUsername:CustomerPassword ftps://ftp.trifacta.com/Releases/Trifacta_6.8/hadoop/hadoop-deps.tar.gz

Via FTP/FTPS

  1. Access the FTP server via your preferred FTP client.
  2. Browse to the following directory:

    Releases/Trifacta_x.y/hadoop/

    where: x.y corresponds to the release number that you are installing (e.g. Release 6.8).

  3. Download the following file: hadoop_deps.tar.gz

Install Dependencies

If needed, transfer the download to the Trifacta node.

Extract it to the following directory:

sudo tar -vxf hadoop-deps.tar --directory /opt/trifacta/

NOTE:

After you extract the files to the target directory, verify that the ownership of the new directory (/opt/trifacta/hadoop-deps/) and its subfolders match the ownership settings for the rest of the Trifacta installation in /opt/trifacta.

Next Steps

Install and configure Trifacta databases

The Trifacta platform requires installation of several databases. If you have not done so already, you must install and configure the databases used to store . See Install Databases in the Databases Guide.

Install configuration

After installation is complete, additional configuration is required to make the platform operational. See Install Configuration.

This page has no comments.