Page tree

 

Contents:


Please complete the following steps in the listed order to configure your installed instance of the Trifacta® platform to integrate with the running environment cluster.

Pre-requisites

  1. Deploy running environment cluster and Trifacta node.

    NOTE: The running environment cluster can be deployed as part of the installation process. You can also integrate the platform with a pre-existing cluster. Details are below.

  2. Install Trifacta platform on the node.

For more information, see Install for Azure.

Configure Azure

Create registered application

You must create an Azure Active Directory (AAD) application and grant it the desired access permissions, such as read/write access to the ADLS resource and read/write access to the Azure Key Vault secrets.


NOTE: If you are integrating with Azure Databricks and are Managed Identities for authentication, please skip this section. That configuration is covered in a later step.

This service principal is used by the Trifacta platform for access to all Azure resources. For more information, see https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-create-service-principal-portal.

After you have registered, acquire the following information:

Azure PropertyLocationUse
Application ID

Acquire this value from the Registered app blade of the Azure Portal.

Applied to Trifacta platform configuration: azure.applicationid.

Service User Key

Create a key for the Registered app in the Azure Portal.

Applied to Trifacta platform configuration: azure.secret.

NOTE: If you are using Azure AD to integrate with an Azure Databricks cluster, the Azure AD secret value stored in azure.secret must begin with an alphanumeric character. This is a known issue.

Directory IDCopy the Directory ID from the Properties blade of Azure Active Directory.

Applied to Trifacta platform configuration: azure.directoryId.

To create an Azure Active Directory (AAD) application, please complete the following steps in the Azure console.

Steps:

  1. Create registered application:

    1. In the Azure console, navigate to Azure Active Directory > App Registrations.

    2. Create a New App. Name it trifacta.

      NOTE: Retain the Application ID and Directory ID for configuration in the Trifacta platform.

  2. Create a client secret:
    1. Navigate to Certificates & secrets.
    2. Create a new Client secret.

      NOTE: Retain the value of the Client secret for configuration in the Trifacta platform.

  3. Add API permissions:
    1. Navigate to API Permissions.
    2. Add Azure Key Vault with the user_impersonation permission.

These properties are applied later in the configuration process.

Configure the Platform

Configure for HDI

If you are integrating the Trifacta platform with a pre-existing HDI cluster, additional configuration is required. See Configure for HDInsight.

NOTE: If you created a new HDI cluster as part of the installation, all required is listed below.

Configure for Azure Databricks

You can integrate the Trifacta platform with Azure Databricks. For more information, see Configure for Azure Databricks.

Configure base storage layer

For Azure installations, you can set your base storage layer to be HDFS or WASB.

NOTE: The base storage layer must be set after installation. After it has been configured, it cannot be modified.

Azure storagewebapp.storageProtocol settinghdfs.protocolOverride setting
WASBwasbs (empty)
ADLS Gen2abfss(empty)
ADLS Gen1hdfsadl

See Set Base Storage Layer.

Configure for Key Vault

For authentication purposes, the Trifacta platform must be integrated with an Azure Key Vault keystore. See Configure Azure Key Vault.

Configure for SSO

If needed, you can integrate the Trifacta platform with Azure AD for Single-Sign On to the platform. See Configure SSO for Azure AD.

Configure for ADLS Gen2

Enable read-only or read-write access to ADLS Gen2. For more information, see Enable ADLS Gen2 Access.

Configure for ADLS Gen1

Enable read-only or read-write access to ADLS Gen1. For more information, see Enable ADLS Access.

Configure for WASB

Enable read-only or read-write access to WASB. For more information on integrating with WASB, see Enable WASB Access.

Configure relational connections

If you are integrating Trifacta Wrangler Enterprise with relational datastores, please complete the following configuration sections.

Create encryption key file

An encryption key file must be created on the Trifacta node. This key file is shared across all relational connections. See Create Encryption Key File.

Create Hive connection

You can create a connection to the Hive instance on the HDI cluster with some modifications.

  • High Availability: Natively, Azure supports high availability for HiveServer2 via Zookeeper. Host and port information in the JDBC URL must be replaced with a Zookeeper quorum.

In addition to the other Hive connection properties, please specify the following values for the properties listed below:

PropertyDescription
Host

Use your Zookeeper quorum value. For the final node of the list, omit the port number. Example:

zk1.cloudapp.net:2181,zk2.cloudapp.net:2181,zk3.cloudapp.net
PortSet this value to 2181.
Connect String options

In addition to any options required for your environment, include the following option:

/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
DatabaseEnter your Hive database name.

Connections are created through the Connections page. See Connections Page.

For additional details on creating a connection to Hive, see Create Hive Connections.

A Hive connection can also be created using the above property substitutions via programmatic methods.

Create Azure SQL Database connection

For more information, see Create Azure SQL Database Connections.

Create Azure SQL DW connection

For more information, see Create SQL DW Connections.

Testing

  1. Load a dataset from the HDI cluster through either ADLS or WASB.
  2. Perform a few simple steps on the dataset.
  3. Click Run Job in the Transformer page. 
  4. When specifying the job: 
    1. Click the Profile Results checkbox.
    2. Select Spark.
  5. When the job completes, verify that the results have been written to the appropriate location.

This page has no comments.