Page tree

 

Contents:


By default, Microsoft Azure deployments integrate with Azure Data Lake Store (ADLS). Optionally, you can configure your deployment to integrate with WASB.

Limitations of ADLS Integration

  • In this release, the Trifacta platform supports integration with the default store only. Extra stores are not supported.

Read-only access

If the base storage layer has been set to WASB, you can follow these instructions to set up read-only access to ADLS. 

NOTE: To enable read-only access to ADLS, do not set the base storage layer to hdfs. The base storage layer for ADLS read-write access must remain wasbs.

Pre-requisites

General

  • The Trifacta platform has already been installed and integrated with an Azure Databricks cluster. See Configure for Azure Databricks.
  • HDFS must be set as the base storage layer for the Trifacta platform instance. See Set Base Storage Layer.
  • For each combination of blob host and container, a separate Azure Key Vault Store entry must be created. For more information, please contact your Azure admin. 

Create a registered application

Before you integrate with Azure ADLS, you must create the Trifacta platform as a registered application. See Configure for Azure.

Azure properties

The following properties should already be specified in the Admin Settings page. Please verify that the following have been set:

  • azure.applicationId
  • azure.secret
  • azure.directoryId

The above properties are needed for this configuration. For more information, see Configure for Azure.

Key Vault Setup

An Azure Key Vault has already been set up and configured for use by the Trifacta platform. For more information, see Configure for Azure.

Configure ADLS Authentication

Authentication to ADLS storage is supported for the following modes, which are described in the following section.

ModeDescription
System

All users authenticate to ADLS using a single system key/secret combination. This combination is specified in the following parameters, which you should have already defined:

  • azure.applicationId
  • azure.secret
  • azure.directoryId

These properties define the registered application in Azure Active Directory. System authentication mode uses the registered application identifier as the service principal for authentication to ADLS. All users have the same permissions in ADLS.

For more information on these settings, see Configure for Azure.

User

Per-user mode allows individual users to authenticate to ADLS through their Azure Active Directory login.

NOTE: Additional configuration for AD SSO is required. Details are below.

Steps:

Please complete the following steps to specify the ADLS access mode.

  1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.
  2. Set the following parameter to the preferred mode (system or user):

    "azure.adl.mode": "<your_preferred_mode>",
  3. Save your changes.

System mode access

When access to ADLS is requested, the platform uses the combination of Azure directory ID, Azure application ID, and Azure secret to complete access.

After defining the properties in the Trifacta platform, system mode access requires no additional configuration.

User mode access

In user mode, a user ID hash is generated from the Key Vault key/secret and the user's AD login. This hash is used to generate the access token, which is stored in the Key Vault.

Set up for Azure AD SSO

NOTE: User mode access to ADLS requires Single Sign On (SSO) to be enabled for integration with Azure Active Directory. For more information, see Configure SSO for Azure AD.

Configure the Trifacta platform

Define default storage location and access key

In platform configuration, you must define the following properties:

"azure.adl.store": "<your_value_here>",

This property defines the ADLS storage to which all output data is delivered. Example:

adl://<YOUR_STORE_NAME>.azuredatalakestore.net

Per earlier configuration:

  • webapp.storageProtocol must be set to hdfs.
  • hdfs.protocolOverride must be set to adl.

Configure HDFS properties

In the Trifacta platform, you must configure the following properties for effective communication with HDFS.

 "hdfs": {
  "username": "[hadoop.user]",
  "enabled": true,
  "webhdfs": {
   "httpfs": false,
   "maprCompatibilityMode": false,
   "ssl": {
    "enabled": true,
    "certificateValidationRequired": false,
    "certificatePath": "<YOUR_PATH_HERE>"
   },
   "host": "[ADLS].azuredatalakestore.net",
   "version": "/webhdfs/v1",
   "proxy": {
    "host": "proxy",
    "enabled": false,
    "port": 8080
   },
   "credentials": {
    "username": "[hadoop.user]",
    "password": ""
   },
   "port": 443
  },
  "protocolOverride": "adl",
  "highAvailability": {
   "serviceName": "[ADLS].azuredatalakestore.net",
   "namenodes": {}
  },
  "namenode": {
   "host": "[ADLS].azuredatalakestore.net",
   "port": 443
  }
 }
PropertyDescription
hdfs.username

Set this value to the name of the user that the Trifacta platform uses to access the cluster.

hdfs.enabled
Set to true.
hdfs.webhdfs.httpfsUse of HttpFS in this integration is not supported. Set this value to false.
hdfs.webhdfs.maprCompatibilityModeThis setting does not apply to ADLS. Set this value to false
hdfs.webhdfs.ssl.enabledSSL is always used for ADLS. Set this value to true.
hdfs.webhdfs.ssl.certificateValidationRequiredSet this value to false .
hdfs.webhdfs.ssl.certificatePath

This value is not used for ADLS.

hdfs.webhdfs.hostSet this value to the address of your ADLS datastore.
hdfs.webhdfs.versionSet this value to /webhdfs/v1.
hdfs.webhdfs.proxy.host

This value is not used for ADLS.

hdfs.webhdfs.proxy.enabledA proxy is not used for ADLS. Set this value to false .
hdfs.webhdfs.proxy.port

This value is not used for ADLS.

hdfs.webhdfs.credentials.username

Set this value to the name of the user that the Trifacta platform uses to access the cluster.

hdfs.webhdfs.credentials.passwordLeave this value empty for ADLS.
hdfs.webhdfs.portSet this value to 443.
hdfs.protocolOverride
Set this value to adl.
hdfs.highAvailability.serviceName
Set this value to the address of your ADLS datastore.
hdfs.highAvailability.namenodes
Set this value to an empty value.
hdfs.namenode.hostSet this value to the address of your ADLS datastore.
hdfs.namenode.port Set this value to 443.

Enable

Steps:

  1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.
  2. Locate the following parameter and change its value to true:

    "azure.adl.enabled": true,
  3. Configure use of the appropriate Hadoop bundle JAR:

    "hadoopBundleJar": "hadoop-deps/hdp-2.6/build/libs/hdp-2.6-bundle.jar",
  4. Save your changes.

Testing

Restart services. See Start and Stop the Platform.

After the configuration has been specified, an ADLS connection appears in the Import Data page. Select it to begin navigating for data sources.

Try running a simple job from the Trifacta application. For more information, see Verify Operations.

This page has no comments.