Page tree

 

Contents:


This section contains hardware and software requirements for successful installation of  Trifacta® Wrangler Enterprise.

Edge Node Requirements

If the Trifacta platform is installed in a Hadoop cluster, the platform is typically installed on an Edge Node. 

NOTE: The Trifacta platform does not require edge node installation.

Hardware Requirements

Minimum hardware:

ItemRequired
Number of cores

8 cores

RAM

64 GB

NOTE: The platform requires 12GB of dedicated RAM to start and perform basic operations.

Disk space to install software4 GB
Total free disk space

16 GB

Space requirements by volume:

  • /opt - 10 GB
  • /var - Remainder

Recommended hardware:

ItemRecommended
Number of cores

16 cores

RAM

128 GB

NOTE: The platform requires 12GB of dedicated RAM to start and perform basic operations.

Disk space to install software16 GB
Total free disk space

100 GB

Space requirements by volume:

  • /opt - 10 GB
  • /var - Remainder

Operating System Requirements

The following operating systems are supported for the Trifacta node.

NOTE: The Trifacta platform requires 64-bit versions of any operating system.

  • CentOS 6.4 - 6.x, 7.1, 7.2, 7.4
  • RHEL 6.4 - 6.x, 7.1, 7.2, 7.4

    NOTE: If you are installing on CentOS/RHEL 7.1, you must be connected to an online repository for some critical updates. Offline installation is not supported for these operating system distributions.

    NOTE: For security reasons, RHEL 7.3 is not supported for installation of Release 5.0 or later of the Trifacta platform. Please upgrade to RHEL 7.4 or a later supported release.

    Tip: Disabling SELinux on the Trifacta node is recommended. However, if security policies require it, you may need to apply some changes to the environment. For more information on SELinux, see Install from AWS Marketplace.

  • Ubuntu 14.04 (codename Trusty) and 16.04 (codename Xenial)

    NOTE: For Ubuntu installations, some packages must be manually installed. Instructions are provided later in the process.

NOTE: During normal operations, the platform may maintain a high number of open files, which may exceed the default limit defined by the operating system. Before you begin using the system, you should raise this limit to 64000. For more information on raising the ulimit, see Miscellaneous Configuration.

NOTE: If you are enabling SSO and want to use an Apache Server as a reverse proxy for the Trifacta node, you may need to upgrade to Apache Server. See Configure SSO for AD-LDAP.

For more information on RPM dependencies, see System Dependencies.

Database Requirements

The following database versions are supported by the Trifacta platform for storing metadata and the user's  Wrangle  recipes. 

NOTE: One of these supported versions must be installed on the Trifacta node.

Supported versions:

  • PostgreSQL 9.6
  • MySQL 5.7

    NOTE: If you are installing the databases into MySQL, you must download and install the MySQL Java driver onto the Trifacta node. For more information, see Install the Databases.

    NOTE: MySQL 5.7 is not supported for installation in Amazon RDS.

NOTE: H2 database type is used for internal testing. It is not a supported database.

 

For more information on installing and configuring the database, see Set up the Databases.

Other Software Requirements

The following software components must be present.   

Java

  • Java 1.8

    NOTE: There are additional requirements related to Java JDK listed in the Hadoop Components section listed below.

    NOTE: If you are integrating your Trifacta instance with S3, you must install the Oracle JRE 1.8 onto the Trifacta node. No other version of Java is supported for S3 integration. See Enable S3 Access.

    NOTE: OpenJDK 1.8 is officially supported. It must be installed on the Trifacta node during the installation process. See Installation Steps.

Other Software

NOTE: For Ubuntu installations, the following packages must be manually installed using Ubuntu-specific versions. Instructions and version numbers are provided later in the process.

 

  • NginX 1.12.2
  • NodeJS 6.12.2

Root User Access

Installation must be executed as the root user on the Trifacta node.

SSL Access

(Optional) If users are connecting to the Trifacta platform, an SSL certificate must be created and deployed. See Install SSL Certificate.

Internet Access

(Optional) Internet access is not required for installation or operation of the platform. However, if the server does not have Internet access, you must acquire additional software as part of the disconnected install. For more information, see Install Dependencies without Internet Access.

Hadoop Cluster Requirements

The following requirements apply if you are integrating the Trifacta platform with your enterprise Hadoop cluster. 

NOTE: For general guidelines on sizing your cluster, see Sizing Guidelines.

NOTE: If you have upgrades to your Hadoop cluster planned for the next year, you should review those plans with Support prior to installation. For more information, please contact Trifacta Support.

Supported Hadoop Distributions

NOTE: The Trifacta platform only supports the latest major release and its minor releases of each distribution.

The Trifacta platform only supports the versions of any required components included in a supported distribution. Even if they are upgraded components, use of non-default versions of required components is not supported.

 

The Trifacta platform supports the following minimum Hadoop distributions:

VendorSupported VersionsLink

Cloudera

  • CDH 5.15  Recommended
  • CDH 5.14
  • CDH 5.13
Supported Deployment Scenarios for Cloudera

Hortonworks

  • HDP 2.6 Recommended
  • HDP 2.5
Supported Deployment Scenarios for Hortonworks

Node Requirements

Each cluster node must have the following software:

  • Java JDK 1.8 

Hadoop Component Access

The Trifacta deployment must have access to the following.

Java and Spark version requirements

The following matrix identifies the supported versions of Java and Spark on the Hadoop cluster.

Notes:

 Spark 2.1Spark 2.2Spark 2.3
Java 1.7Supported.Not supported.Not supported.
Java 1.8Supported.Required.Required.

Other components

Hadoop System Ports

For more information, see System Ports.

Site Configuration Files

Hadoop cluster configuration files must be copied into the Trifacta deployment. It is especially important in a YARN deployment. See Configure for Hadoop.

Security Requirements

  • Kerberos supported: 
  • If Kerberos and secure impersonation are not enabled: 
    • A user [hadoop.user (default=trifacta)] must be created on each node of the Hadoop cluster. 
    • A directory [hadoop.dir (default=trifacta)] must be created on the cluster.
    • The user [hadoop.user] must have full access to the directory. which enables storage of the transformation recipe back into HDFS.
    • See Configure for Hadoop.

Cluster Configuration

For more information on integration with Hadoop, see Prepare Hadoop for Integration with the Platform.

User Requirements

Users must access the Trifacta platform through the Google Chrome browser. For more information on user system requirements, see Desktop Requirements.

I/O Requirements

See Supported File Formats.

This page has no comments.