Page tree

 

Contents:


This section contains hardware and software requirements for successful installation of Trifacta® Wrangler Enterprise.

Platform Node Requirements

Node Installation Requirements

If the Trifacta platform is installed in a Hadoop environment, the software must be installed on an edge node of the cluster.

  • If it is integrated with a Cloudera cluster, it must be installed on a gateway node that is managed by Cloudera Manager
  •  If it is integrated with a Hortonworks cluster, it must be installed on an Ambari/Hadoop client that is managed by Hortonworks Ambari.


  • If it is integrated with an HDI cluster, it must be installed on an edge node.


  • Customers who originally installed an earlier version on a non-edge node will still be supported. If the software is not installed on an edge node, you may be required to copy over files from the cluster and to synchronize these files after upgrades. The cluster upgrade process is more complicated.


  • This requirement does not apply to the following cluster integrations:
    • AWS EMR
    • Azure Databricks

NOTE: If you are installing the Trifacta platform into a Docker container, a different set of requirements apply. For more information, see Install for Docker in the Install Guide.


Hardware Requirements

Minimum hardware:

ItemRequired
Number of cores

8 cores, x86_64

RAM

64 GB

The platform requires 12GB of dedicated RAM to start and perform basic operations.

Disk space to install software4 GB
Total free disk space

16 GB

Space requirements by volume:

  • /opt - 10 GB
  • /var - Remainder

Recommended hardware:

ItemRecommended
Number of cores

16 cores, x86_64

RAM

128 GB

The platform requires 12GB of dedicated RAM to start and perform basic operations.

Disk space to install software16 GB
Total free disk space

100 GB

Space requirements by volume:

  • /opt - 10 GB
  • /var - Remainder

Operating System Requirements

The following operating systems are supported for the Trifacta nodeThe Trifacta platform requires 64-bit versions of any supported operating system.

CentOS/RHEL versions:

  • CentOS  7.1, 7.2, 7.4 - 7.7, 8.1

    NOTE: MySQL 5.7 Community is not supported on CentOS/RHEL 8.1.

  • RHEL 7.1, 7.2, 7.4 - 7.7, 8.1

Notes on CentOS/RHEL installation:

  • If you are installing on CentOS/RHEL 7.1, you must be connected to an online repository for some critical updates. Offline installation is not supported for these operating system distributions.
  • For security reasons, RHEL 7.3 is not supported for installation of Release 5.0 or later of the Trifacta platform. Please upgrade to RHEL 7.4 or a later supported release.
  • Installation on CentOS/RHEL versions 7.4 or earlier requires an upgrade of the RPM software on the Trifacta node. Details are provided during the installation process.
  • Disabling SELinux on the Trifacta node is recommended. However, if security policies require it, you may need to apply some changes to the environment.

Ubuntu versions:

  • Ubuntu 18.04 (codename Bionic Beaver)

  • Ubuntu 16.04 (codename Xenial)

Notes on Ubuntu installation:

  • For Ubuntu installations, some packages must be manually installed. Instructions are provided later in the process.

For more information on RPM dependencies, see System Dependencies.

Database Requirements

The following database versions are supported by the Trifacta platform for storing metadata and the user's Wrangle recipes.

Supported database versions:

  • PostgreSQL 12.3

    NOTE: PostgreSQL 12.3 is supported on supported versions of CentOS/RHEL 7 only.

  • PostgreSQL 9.6
  • MySQL 5.7 Community

    NOTE: MySQL 5.7 Community is not supported on CentOS/RHEL 8.1.

Notes on database versions:

  • MySQL 5.7 is not supported for installation in Amazon RDS.

    NOTE: If you are installing or upgrading a deployment of Trifacta Wrangler Enterprise that uses or will use a remote database service, such as Amazon RDS, for hosting the Trifacta databases, please contact Trifacta Customer Success Services. For this release, additional configuration may be required.




  • If you are installing the databases into MySQL, you must download and install the MySQL Java driver onto the Trifacta node. For more information, see Install Databases for MySQL in the Databases Guide.
  • H2 database type is used for internal testing. It is not a supported database.

For more information on installing and configuring the database, see Install Databases in the Databases Guide.

Other Software Requirements

The following software components must be present.

Java

Where possible, you should install the same version of Java on the Trifacta node and on the cluster with which you are integrating.

  • Java 1.8

Notes on Java versions:

  • OpenJDK 1.8 is officially supported. It is installed on the Trifacta node during the installation process.
  • There are additional requirements related to Java JDK listed in the Hadoop Components section listed below.
  • If you are integrating your Trifacta instance with S3, you must install the Oracle JRE 1.8 onto the Trifacta node. No other version of Java is supported for S3 integration. For more information, see Enable S3 Access in the Configuration Guide.

Other Software

For Ubuntu installations, the following packages must be manually installed using Ubuntu-specific versions:

  • NginX 1.12.2
  • NodeJS 12.16.1

Instructions and version numbers are provided later in the process.

Root User Access

Installation must be executed as the root user on the Trifacta node.

SSL Access

(Optional) If users are connecting to the Trifacta platform, an SSL certificate must be created and deployed. See Install SSL Certificate in the Install Guide.

Internet Access

(Optional) Internet access is not required for installation or operation of the platform. However, if the server does not have Internet access, you must acquire additional software as part of the disconnected install. For more information, see Install Dependencies without Internet Access  in the Install Guide.

Hadoop Cluster Requirements

The following requirements apply if you are integrating the Trifacta platform with an enterprise Hadoop cluster.

  • For general guidelines on sizing the cluster, see Sizing Guidelines.
  • If you have upgrades to the Hadoop cluster planned for the next year, you should review those plans with Support prior to installation. For more information, please contact Trifacta Support.

Supported Hadoop Distributions

The Trifacta platform supports the following minimum Hadoop distributions.

  • The Trifacta platform only supports the latest major release and its minor releases of each distribution.
  • The Trifacta platform only supports the versions of any required components included in a supported distribution. Even if they are upgraded components, use of non-default versions of required components is not supported.

Cloudera supported distributions

  • CDH 6.3 Recommended

  • CDH 6.2

  • CDH 6.1

    NOTE: CDH 6.x requires that you use the native Spark libraries provided by the cluster. Additional configuration is required. For more information, see Configure for Spark in the Configuration Guide.

  • CDH 5.16  Recommended

See Supported Deployment Scenarios for Cloudera in the Install Guide.

Hortonworks supported distributions

  • HDP 3.1   Recommended

  • HDP 3.0

    NOTE: HDP 3.x requires that you use the native Spark libraries provided by the cluster. Additional configuration is required. For more information, see Configure for Spark in the Configuration Guide.

  • HDP 2.6

See Supported Deployment Scenarios for Hortonworks  in the Install Guide.

EMR supported distributions

See Configure for EMR in the Configuration Guide.

HDInsight supported distributions

See Configure for HDInsight in the Configuration Guide.

Azure Databricks supported distributions

See Configure for Azure Databricks in the Configuration Guide.


Node Requirements

Each cluster node must have the following software:

  • Java JDK 1.8 (some exceptions may be listed below)

Hadoop Component Access

The Trifacta deployment must have access to the following.

Java and Spark version requirements

The following matrix identifies the supported versions of Java and Spark on the Hadoop cluster. Where possible, you should install the same version of Java on the Trifacta node and on the cluster with which you are integrating.

Notes:


Spark 2.3Spark 2.4
Java 1.8Required.Required.


  • If you are integrating with an EMR cluster, there are specific version requirements for EMR. See Configure for Spark in the Configuration Guide.


Other components

Hadoop System Ports

For more information, see System Ports.

Site Configuration Files

Hadoop cluster configuration files must be copied into the Trifacta deployment. See Configure for Hadoop in the Configuration Guide.

Security Requirements

  • Kerberos supported:
  • If Kerberos and secure impersonation are not enabled:
    • A user [hadoop.user (default=trifacta)] must be created on each node of the Hadoop cluster.
    • A directory [hadoop.dir (default=trifacta)] must be created on the cluster.
    • The user [hadoop.user] must have full access to the directory. which enables storage of the transformation recipe back into HDFS.
    • See Configure for Hadoop in the Configuration Guide.

Cluster Configuration

For more information on integration with Hadoop, see Prepare Hadoop for Integration with the Platform.

User Requirements

Users must access the Trifacta platform through one of the supported browser versions. For more information on user system requirements, see Desktop Requirements.

I/O Requirements

See Supported File Formats in the User Guide.

This page has no comments.