This install process applies to installing Trifacta® on an Azure infrastructure that you manage.
Azure Marketplace deployments:
NOTE: Content in this section does not apply to deployments from the Azure Marketplace. For more information, see the Azure Marketplace.
NOTE: All hardware in use for supporting the platform is maintained within the enterprise infrastructure on Azure.
- Installation of Trifacta on a node in Microsoft Azure
- Installation of Trifacta databases on the same node
- Integration with a supported cluster for running jobs.
- Base storage layer and backend datastore of ADLS Gen1, ADLS Gen2, or WASB
- High availability or failover of the Trifacta node is not supported in Azure.
- High availability of cluster components is automatically managed by the HDI cluster.
- Auto-management does not apply to non-Hadoop clusters, such as Azure Databricks.
For more information on deployment scenarios, see Supported Deployment Scenarios for Azure.
The following limitations apply to installations of Trifacta Wrangler Enterprise on Azure:
NOTE: If you are using Azure Databricks as a datasource, please verify that openJDKv1.8.0_242 or earlier is installed on the Trifacta node. Java 8 is required. If necessary, downgrade the Java version and restart the platform. There is a known issue with TLS v1.3.
- The application user credentials are used to access to the HDI cluster. Details are provided below.
- ADLS Gen1/Storage Blob access is only for the HDInsight cluster's primary storage. Additional storage accounts are not supported.
- HDFS must be set as the base storage layer of the Trifacta platform. Details are provided later.
- S3 integration and AWS-based integrations such as Redshift are not supported.
- Use of HttpFS is not supported.
- Security features such as Kerberos and secure impersonation are not supported.
For general limitations on Trifacta, see Product Limitations.
Please acquire the following assets:
- Install Package: Acquire the installation package for your operating system.
- Offline system dependencies: If you are completing the installation without Internet access, you must also acquire the offline versions of the system dependencies. See Install Dependencies without Internet Access.
Azure Desktop Requirements
- All desktop users must be able to connect to the instance through the enterprise infrastructure.
Depending on which of the following Azure components you are deploying, additional pre-requisites and limitations may apply:
- Configure SSO for Azure AD in the Configuration Guide
Before you begin, please verify that you have completed the following:
- Read: Please read this entire document before you create the EMR cluster or install the Trifacta platform.
- Cluster sizing: Before you begin, you should allocate sufficient resources for the cluster. For guidance, please contact your Trifacta representative.
- Node: Review the system requirements for the node hosting the Trifacta platform. See System Requirements in the Planning Guide.
The required set of ports must be enabled for listening. See System Ports in the Planning Guide.
This node should be dedicated for Trifacta use.
- The platform utilizes a set of databases that must be accessed from the Trifacta node. Databases are installed as part of the workflow described later.
Deploy the Cluster
Cluster types: Deploy and provision a cluster of one of the supported types. The Trifacta platform supports integrations with multiple cluster types.
NOTE: Before you deploy, you should review cluster sizing options. For guidance, please contact your Trifacta representative.
Backend storage layer: Primary storage of the cluster may be set to an existing ADLS Gen1, ADLS Gen2, or WASB layer.
For more information, see Supported Deployment Scenarios for Azure.
Prepare the cluster
NOTE: This section applies only if you are using HDI. If not, please skip to the next section.
Create the following directories, which are specified by parameter in the platform.
Default HDFS path Platform configuration property
/trifacta /trifacta/dictionaries hdfs.pathsConfig.dictionaries /trifacta/libraries hdfs.pathsConfig.libraries /trifacta/queryResults hdfs.pathsConfig.batchResults /trifacta/tempfiles hdfs.pathsConfig.tempFiles /trifacta/uploads hdfs.pathsConfig.fileUpload /trifacta/.datasourceCache hdfs.pathsConfig.globalDatasourceCache
Change the ownership of the above directories to
trifacta:trifactaor the corresponding values for the S3 user in your environment.
Additional users may be required. For more information, see Required Users and Groups in the Planning Guide.
Deploy the Trifacta node
In your Azure infrastructure, you must deploy a suitable VM for the installation of the Trifacta platform.
The operating system requirements for the VM for installing the platform vary depending on the type of job execution cluster with which you are running.
|Cluster Type||Supported O/S for VM||Notes|
Trifacta platform must be installed on an edge node of the HDInsight cluster.
|Azure Databricks||CentOS and Ubuntu|
- For more information, see System Requirements in the Planning Guide.
- A set of ports must be opened on the VM for the platform. For more information, see System Ports in the Planning Guide.
- When you configure the platform to integrate with the cluster, you must acquire some information about the cluster resources. For more information on the set of information to collect, see Product Support Matrix in the Planning Guide.
For more information on the supported cluster distributions, see Supported Deployment Scenarios for Azure.
NOTE: These steps are covered in greater detail later in this section.
The installation and configuration process requires the following steps. To continue, see Next Steps below.
Install software: Install the Trifacta platform software on the Trifacta node. See Install Software.
Install databases: The platform requires several databases for storage.
NOTE: The default configuration assumes that you are installing the databases on a PostgreSQL server on the same edge node as the software using the default ports. If you are changing the default configuration, additional configuration is required as part of this installation process.
For more information, see Install Databases in the Databases Guide.
- Start the platform: For more information, see Start and Stop the Platform.
- Login to the application: After software and databases are installed, you can login to the application to complete configuration:
- See Login.
As soon as you login, you should change the password on the admin account. In the left menu bar, select User menu > Admin console > Admin settings. Scroll down to Manage Users. For more information, see Change Admin Password in the Configuration Guide.
Tip: At this point, you can access the online documentation through the application. In the left menu bar, select Help menu > Documentation. All of the following content, plus updates, is available online. See Documentation below.
- Install configuration: After you are able to successfully login to the Trifacta application, you must configure the product to work with your backend storage layer and the running environment on the cluster. See Install Configuration.
To continue, please install the Trifacta software on the Trifacta node.
NOTE: Please complete the installation steps for the operating system version that is installed on the Trifacta node.
See Install Software.
This page has no comments.