This install process applies to installing  on an Azure infrastructure that you manage. 

Azure Marketplace deployments:

NOTE: Content in this section does not apply to deployments from the Azure Marketplace, which provide fewer deployment and configuration options. For more information, see the Azure Marketplace.

Scenario Description

NOTE: All hardware in use for supporting the platform is maintained within the enterprise infrastructure on Azure.

  • Installation of  on a node in Microsoft Azure
  • Installation of  on the same node
  • Integration with a supported cluster for running jobs.
  • Base storage layer and backend datastore of ADLS or WASB
  • High availability or failover of the  is not supported in Azure.
  • High availability of cluster components is automatically managed by the HDI cluster. 
    • Auto-management does not apply to non-Hadoop clusters, such as Azure Databricks.

For more information on deployment scenarios, see Supported Deployment Scenarios for Azure.

Product Limitations

  • The application user credentials are used to access to the HDI cluster. Details are provided below.
  • ADLS/Storage Blob access is only for the HDInsight cluster's primary storage. Additional storage accounts are not supported.
  • HDFS must be set as the base storage layer of the . Details are provided later.
    • S3 integration and AWS-based integrations such as Redshift are not supported.
  • Use of HttpFS is not supported.
  • Security features such as Kerberos and secure impersonation are not supported.

For more information on the limitations of this deployment scenario, see Product Limitations.

Pre-requisities

Desktop Requirements

  • All desktop users of the platform should have a supported version of Google Chrome installed on their desktops.
  • All desktop users must be able to connect to the EC2 instance through the enterprise infrastructure.

Azure Pre-requisites

Depending on which of the following Azure components you are deploying, additional pre-requisites and limitations may apply. Please review these sections as well.

Preparation

Before you begin, please verify that you have completed the following:

  1. Review Planning Guide: Please review and verify Install Preparation and sub-topics.
  2. Read: Please read this entire document before you create the EMR cluster or install the .
  3. Acquire Assets: Acquire the installation package for your operating system and your license key. For more information, contact .
    1. If you are completing the installation without Internet access, you must also acquire the offline versions of the system dependencies. See Install Dependencies without Internet Access.
  4. Cluster sizing: Before you begin, you should allocate sufficient resources for the cluster. For guidance, please contact your .

  5. Node: Review the system requirements for the node hosting the . See System Requirements.
    1. The required set of ports must be enabled for listening. See System Ports.

    2. This node should be dedicated for .

  6. Databases:
    1. The platform utilizes a set of databases that must be accessed from the . Databases are installed as part of the workflow described later.
    2. For more information on the supported databases and versions, see System Requirements.
    3. For more information on database installation requirements, see Install Databases.

Limitations: For more information on limitations of this scenario, see Product Limitations in the Install Preparation area.


Deploy the Cluster

Deploy and provision a cluster of one of the supported types. The  supports integrations with multiple cluster types. 

NOTE: Before you deploy, you should review cluster sizing options. For guidance, please contact your .

Primary storage of the cluster may be set to an existing Azure Data Lake Store or Blob Storage.

For more information, see Supported Deployment Scenarios for Azure.

Deploy the 

In your Azure infrastructure, you must deploy a suitable VM for the installation of the .

The operating system requirements for the VM for installing the platform vary depending on the type of job execution cluster with which you are running.

Cluster TypeSupported O/S for VMNotes
HDInsightUbuntu only

must be installed on an edge node of the HDInsight cluster.

Azure DatabricksCentOS and Ubuntu 

For more information on the supported EMR distributions, see Supported Deployment Scenarios for Azure.

Prepare the cluster

  1. Create the following directories, which are specified by parameter in the platform. 

    Default HDFS pathPlatform configuration property

    /user/trifacta

     
    /trifacta 
    /trifacta/dictionarieshdfs.pathsConfig.dictionaries
    /trifacta/librarieshdfs.pathsConfig.libraries
    /trifacta/queryResultshdfs.pathsConfig.batchResults
    /trifacta/tempfileshdfs.pathsConfig.tempFiles
    /trifacta/uploadshdfs.pathsConfig.fileUpload
    /trifacta/.datasourceCachehdfs.pathsConfig.globalDatasourceCache
  2. Change the ownership of the above directories to trifacta:trifacta or the corresponding values for the S3 user in your environment.

Additional users may be required. For more information, see Required Users and Groups in the Install Preparation area.

Install Workflow

Please complete these steps listed in order.

1 - Install Software

Install the  software on the node you created.

NOTE: You must follow the instructions provided for Ubuntu installation.


See Install Software.

2 - Install Databases

The platform requires several databases for storing metadata.

NOTE: The software assumes that you are installing the databases on a PostgreSQL server on the same node as the software. If you are not or are changing database names or ports, additional configuration is required as part of this installation process.

For more information, see Install Databases in the Databases Guide.

3 - Start the platform

For more information, see Start and Stop the Platform.

4 - Login to the Application

After software and databases are installed, you can login to the application to complete configuration. See Login.

As soon as you login, you should change the password on the admin account. In the left menu bar, select Settings > Admin Settings. Scroll down to Manage Users. For more information, see Change Admin Password.

Tip: At this point, you can access the online documentation through the application. In the left menu bar, select Help menu > Product Docs. All of the following content, plus updates, is available online. See Documentation below.

Configuration Workflow

After you have completed the above topics, you can complete the configuration for your deployment below.

NOTE: The following configuration topics are not part of this installation guide. You should log in to the application and access the links below.

  1. Configure for Azure: Configure the platform to work with Azure. 
  2. Integrate with cluster: If the application is up and running, you can configure to the backend cluster for running jobs. Choose one of the following:
    1. HDInsight
    2. Azure Databricks
  3. Integrate with backend storage:
    1. Set base storage layer: The base storage layer must be set at the time of install and cannot be changed. See Set Base Storage Layer.
    2. ADLS
    3. WASB
  4. Verify operations: At this point, you should be able to run a job. See Verify Operations.
  5. Create additional connections: Through connections, you can access other sources of data and, optionally, publish job results. 

Documentation

You can access complete product documentation online and in PDF format. From within the product, select Help menu > Product Docs.The following configuration topics are relevant to Azure deployments. Please review them in order.

TopicDescription
Supported Deployment Scenarios for AzureMatrix of supported Azure components.

Configure for Azure

Top-level configuration topic on integrating the platform with Azure.

Tip: You should review this page.

Configure for HDInsight

Review this section if you are integrating the with a pre-existing HDI cluster.

Configure for Azure DatabricksReview this section if you are integrating with a pre-existing Azure Databricks cluster.
Enable ADLS AccessConfiguration to enable access to ADLS.
Enable WASB AccessConfiguration to enable access to WASB.
Verify OperationsYou should be able to verify platform operations by running a simple job at this time.
Relational Connections

To enable, see Enable Relational Connections.

Azure-specific relational connections:

Configure SSO for Azure AD

How to integrate the with Azure Active Directory for Single Sign-On.