This install process applies to installing  on an Azure infrastructure that you manage. 

Azure Marketplace deployments:

NOTE: Content in this section does not apply to deployments from the Azure Marketplace. For more information, see the Azure Marketplace.

Scenario Description

NOTE: All hardware in use for supporting the platform is maintained within the enterprise infrastructure on Azure.

  • Installation of  on a node in Microsoft Azure
  • Installation of  on the same node
  • Integration with a supported cluster for running jobs.
  • Base storage layer and backend datastore of ADLS Gen1, ADLS Gen2, or WASB
  • High availability or failover of the  is not supported in Azure.
  • High availability of cluster components is automatically managed by the HDI cluster. 
    • Auto-management does not apply to non-Hadoop clusters, such as Azure Databricks.

For more information on deployment scenarios, see Supported Deployment Scenarios for Azure.

Limitations

Deployment limitations

The following limitations apply to installations of  on Azure:

  • The application user credentials are used to access to the HDI cluster. Details are provided below.
  • ADLS Gen1/Storage Blob access is only for the HDInsight cluster's primary storage. Additional storage accounts are not supported.
  • HDFS must be set as the base storage layer of the . Details are provided later.
    • S3 integration and AWS-based integrations such as Redshift are not supported.
  • Use of HttpFS is not supported.
  • Security features such as Kerberos and secure impersonation are not supported.

Product limitations

For general limitations on , see Product Limitations.

Pre-requisities

Please acquire the following assets:

  • Install Package: Acquire the installation package for your operating system.
    • License Key: As part of the installation package, you should receive a license key file. See License Key for details.
    • For more information, contact .
  • Offline system dependencies: If you are completing the installation without Internet access, you must also acquire the offline versions of the system dependencies. See Install Dependencies without Internet Access.

Azure Desktop Requirements

  • All desktop users must be able to connect to the instance through the enterprise infrastructure.

Azure Pre-requisites

Depending on which of the following Azure components you are deploying, additional pre-requisites and limitations may apply:

Preparation

Before you begin, please verify that you have completed the following:

  1. Read: Please read this entire document before you create the EMR cluster or install the .
  2. Cluster sizing: Before you begin, you should allocate sufficient resources for the cluster. For guidance, please contact your .
  3. Node: Review the system requirements for the node hosting the . See System Requirements in the Planning Guide.
    1. The required set of ports must be enabled for listening. See System Ports in the Planning Guide.

    2. This node should be dedicated for .

  4. Databases: 
    1. The platform utilizes a set of databases that must be accessed from the . Databases are installed as part of the workflow described later.


Deploy the Cluster

Cluster types: Deploy and provision a cluster of one of the supported types. The  supports integrations with multiple cluster types. 

NOTE: Before you deploy, you should review cluster sizing options. For guidance, please contact your .

Backend storage layer: Primary storage of the cluster may be set to an existing ADLS Gen1, ADLS Gen2, or WASB layer.

For more information, see Supported Deployment Scenarios for Azure.

Prepare the cluster

NOTE: This section applies only if you are using HDI. If not, please skip to the next section.

Prepare directories

  1. Create the following directories, which are specified by parameter in the platform. 

    Default HDFS pathPlatform configuration property

    /user/trifacta

     
    /trifacta 
    /trifacta/dictionarieshdfs.pathsConfig.dictionaries
    /trifacta/librarieshdfs.pathsConfig.libraries
    /trifacta/queryResultshdfs.pathsConfig.batchResults
    /trifacta/tempfileshdfs.pathsConfig.tempFiles
    /trifacta/uploadshdfs.pathsConfig.fileUpload
    /trifacta/.datasourceCachehdfs.pathsConfig.globalDatasourceCache
  2. Change the ownership of the above directories to trifacta:trifacta or the corresponding values for the S3 user in your environment.

Additional users may be required. For more information, see Required Users and Groups in the Planning Guide.

Deploy the 

In your Azure infrastructure, you must deploy a suitable VM for the installation of the .

The operating system requirements for the VM for installing the platform vary depending on the type of job execution cluster with which you are running.

Cluster TypeSupported O/S for VMNotes
HDInsightUbuntu only

must be installed on an edge node of the HDInsight cluster.

Azure DatabricksCentOS and Ubuntu 

For more information on the supported cluster distributions, see Supported Deployment Scenarios for Azure.

Install Workflow

NOTE: These steps are covered in greater detail later in this section.

Next Steps

To continue, please install the  on the .

NOTE: Please complete the installation steps for the operating system version that is installed on the .

See Install Software.