Page tree

 

Contents:


Trifacta node

The node where the Trifacta software is to be installed should meet the following requirements:

ItemDescriptionStatus or Value

Operating System

The operating system on the node must be one of the supported 64-bit versions. For more information, see System Requirements. 

Cores

Minimum of four cores 

RAM Memory

Minimum of 16 GB dedicated 

Disk Space

Minimum of 20 GB 

Internet Access

If your data sources are available over an Internet connection, the platform must be permitted to use that connection. 

Databases

The Trifacta databases can be installed in PostgreSQL or MySQL.

NOTE: By default, the databases are installed on the local server. They can be installed on a remote node as needed.

 

Cloud Infrastructure

Trifacta Wrangler Enterprise can be installed within your enterprise infrastructure or optionally on one of the following cloud-based infrastructures.

ItemDescriptionStatus or Value

AWS

Trifacta Wrangler Enterprise is to be installed on Amazon Web Services (AWS) infrastructure.

Tip: Trifacta Wrangler Enterprise can be installed from the Amazon Marketplace with a number of configuration steps automatically performed for you. For more information, please visit Wrangler Enterprise.

 

Azure

Trifacta Wrangler Enterprise is to be installed on Microsoft Azure infrastructure.

Tip: Trifacta Wrangler Enterprise can be installed from the Azure Marketplace with a number of configuration steps automatically performed for you. For more information, please visit Trifacta Wrangler Enterprise.

 

Hadoop Cluster

If your instance of  Trifacta Wrangler Enterprise must integrate with a Hadoop-based cluster, please verify the following:

NOTE: The number of nodes in your cluster and the number of cores and amount of memory on each data node can affect the performance of running jobs on the cluster. If you have questions about cluster size, please contact Trifacta Customer Success Services.

ItemDescriptionStatus or Value

Cluster Type and Version

Supported types of Hadoop clusters:

 

Number of data nodes

Total number of data nodes. 

Data node - number of cores

Number of cores on each data node. 

Data node - memory

Volume of RAM memory (GB) on each data node. 

Upgrade plans

If there are planned upgrades to the cluster, please review the list of supported versions to verify that the new version is supported within the timeframe of your upgrade plans. 

Hadoop Cluster Details

During installation and configuration, you may need to specify the following configuration information to successfully integrate  Trifacta Wrangler Enterprise with your cluster.

ItemDescriptionStatus or Value

Namenode host

Host name of the namenode on the cluster 

Namenode port

Port number of the namenode on the cluster 

Secondary namenode host

Host name for the secondary namenode for the cluster

NOTE: This value is only required if high availability is enabled on the cluster.


 

Secondary namenode port

Port number for the secondary namenode on the cluster

NOTE: This value is only required if high availability is enabled on the cluster.


 

Namenode Service name

Name for the namenode service

NOTE: This value is only required if high availability is enabled on the cluster.


 

ResourceManager host

Host name for the ResourceManager on the cluster 

ResourceManager port

Port number for the ResourceManager on the cluster 

Secondary ResourceManager host

Host name for the secondary ResourceManager on the cluster

NOTE: This value is only required if high availability is enabled on the cluster.

 

Secondary ResourceManager port

Port number for the secondary ResourceManager on the cluster

NOTE: This value is only required if high availability is enabled on the cluster.


 

Hive host

Host name for the Hive server on the cluster. For more information, see Configure for Hive in the Configuration Guide. 

Hive port

Port number for the Hive server on the cluster. For more information, see Configure for Hive in the Configuration Guide. 

HttpFS host

Host name for the HttpFS server on the cluster.

NOTE: This value is only required if high availability is enabled on the cluster.


 

HttpFS port

Port number for the HttpFS server on the cluster.

NOTE: This value is only required if high availability is enabled on the cluster.


 

Hadoop Cluster Security

ItemDescriptionStatus or Value

HDFS Service user

By default, the platform uses the trifacta user to integrate with the cluster.

When Kerberos is enabled, this user is used to impersonate other users on the cluster.

For more information on required users, see Required Users and Groups.

 

HDFS transfer encryption

Optionally, the cluster can be configured to use SSL/TLS on data transfer for HDFS.

On the cluster, this setting is defined using the dfs.encrypt.data.transfer setting.

 

SSL on HTTP endpoints

Is encryption applied to the WebHDFS/HttpFS endpoints? 

Keberos

Cluster has been enabled for Kerberos.

For more information on the integration, see Configure for Kerberos Integration in the Configuration Guide.

 

KDC

Name for the Key Data Center for Kerberos.

For more information on the integration, see Configure for Kerberos Integration in the Configuration Guide.

 

Kerberos realm

Realm for the Key Data Center for Kerberos.

For more information, see Configure for Kerberos Integration in the Configuration Guide.

 

Firewall

ItemDescriptionStatus or Value

Firewall between users and cluster

If a firewall is present between users and the cluster, the default web application port must be opened for user access. See below. 

Web application port

By default, the Trifacta application is available on port 3005.

As needed, this value can be modified. For more information, see System Ports.

 

Connectivity

Trifacta Wrangler Enterprise supports read-only or read-write integration with the following datastores.

ItemDescriptionStatus or Value

Primary backend storage

The following primary storage environments are supported: HDFS or S3.For more information, see Set Base Storage Layer in the Configuration Guide.

 

Hive

For more information, see Configure for Hive in the Configuration Guide. 

Relational Connections

Supported connections include Oracle, SQL Server, Teradata, Tableau, Salesforce, and more. For more information, see Connection Types.

 

ItemDescriptionStatus or Value

Redshift

For more information, see Create Redshift Connections in the Configuration Guide. 

Desktop Environments

The following requirements apply to end-user desktop environments.

ItemDescriptionStatus or Value

Google Chrome version

The application requires that users connect using a version of Google Chrome.

NOTE: Trifacta Wrangler Enterprise supports the latest stable version of Google Chrome and the two prior versions, at the time that any release of the product is generally available.

For a list of supported versions, see Desktop Requirements.

 

Desktop Application

If Google Chrome is not available, users can connect to the application using a custom desktop application.

NOTE: The desktop application can be deployed to Windows-based desktops only.

For more information, see Install Desktop Application in the Install Guide.

 

Extras

Deployment or use of these features requires additional configuration or development external to the application. Related content may not be available in printed format.

ItemDescriptionStatus or Value

Cluster Compression

The platform can integrate with clusters that are compressed using Bzip2, Gzip, or Snappy. For more information, see Enable Integration with Compressed Clusters in the Configuration Guide. 

Single Sign-On

The application can integrate with the following Single Sign-On solutions:

 

SSL for the platform

You can apply an SSL certificate to the Trifacta node for secure communications. For more information, see Install SSL Certificate in the Install Guide.

 

API

You can manage aspects of your flows, datasets, and connections through publicly available Application Protocol Interfaces (APIs). For more information, see API Reference in the Developer's Guide. 

UDF

You can create custom user-defined functions for deployment into the platform.

For more information on the list of available functions, see Language Index in the Language Reference Guide.

For more information on UDFs, see User-Defined Functions in the Developer's Guide.

 

This page has no comments.