Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

D toc

D s install marketplace

Please complete the following steps in the listed order to configure your installed instance of the 

D s platform
 to integrate with an HDInsight cluster.

Pre-requisites

  1. Deploy HDI cluster and 

    D s item
    itemnode
    .

    Info

    NOTE: The HDI cluster can be deployed as part of installation from the Marketplace. You can also integrate the platform with a pre-existing cluster. Details are below.

  2. Install 
    D s platform
     on the node.

For more information, see Install from Azure Marketplace.

Excerpt

Configure Azure

Create registered application

You must create a Azure Active Directory (AAD) application and grant it the desired access permissions, such as read/write access to the ADLS resource and read/write access to the Azure Key Vault secrets . This service principal is used by the

D s platform
for access to all Azure resources. For more information, see https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-create-service-principal-portal.

After you have registered, acquire the following information:

Azure PropertyLocationUse
Application ID

Acquire this value from the Registered app blade of the Azure Portal.

Applied to

D s platform
configuration: azure.applicationid.

Service User Key

Create a key for the Registered app in the Azure Portal.

Applied to

D s platform
 configuration: azure.secret.

Directory IDCopy the Directory ID from the Properties blade of Azure Active Directory.

Applied to

D s platform
 configuration: azure.directoryId.

These properties are applied later in the configuration process.

Configure the Platform

Configure for HDI

If you are integrating the

D s platform
with a pre-existing HDI cluster, additional configuration is required. See Configure for HDInsight.

Info

NOTE: If you created a new HDI cluster as part of the installation, all required is listed below.

Configure base storage layer

For Azure installations, you can set your base storage layer to be HDFS or WASB.

Info

NOTE: The base storage layer must be set after installation. After it has been configured, it cannot be modified.

Azure storagewebapp.storageProtocol settinghdfs.protocolOverride setting
WASBwasbs (recommended) or wasb(empty)
ADLShdfsadl

See Set Base Storage Layer.

Configure for Key Vault

For authentication purposes, the

D s platform
must be integrated with an Azure Key Vault keystore. For more information, see https://azure.microsoft.com/en-us/services/key-vault/.

Please complete the following sections to create and configure your Azure Key Vault.

Create a Key Vault resource in Azure

  1. Log into the Azure portal.
  2. Goto: https://portal.azure.com/#create/Microsoft.KeyVault
  3. Complete the form for creating a new Key Vault resource:
    1. Name: Provide a reasonable name for the resource. Example:

      Code Block
      <clusterName>-<applicationName>-<group/organizationName>
    2. Location: Pick the location used by the HDI cluster.
    3. For other fields, add appropriate information based on your enterprise's preferences.
  4. To create the resource, click Create.

Enable Key Vault access for the
D s platform

In the Azure portal, you must assign access policies for application principal of the

D s item
itemregistered application
to access the Key Vault.

Steps:

  1. In the Azure portal, select the Key Vault you created. Then, select Access Policies.
  2. In the Access Policies window, select the
    D s item
    itemregistered application
    .
  3. Click Add New.
  4. For Secret permissions, select the following:
    1. Get
    2. Set
    3. Delete
  5. Do not select any other options.
  6. Click OK.

Create WASB access token

If you are enabling access to WASB, you must create this token within the Azure Portal.

Info

NOTE: Depending on the type of token you create (HTTP & HTTPS or HTTPS only), you must specify the storage protocol (WASB or WASBS) used by the

D s platform
.

For more information, see https://docs.microsoft.com/en-us/rest/api/storageservices/delegating-access-with-a-shared-access-signature.

Configure Key Vault key and secret for WASB

In the Key Vault, you can create key and secret pairs for use.

Base Storage LayerDescription
ADLS

The

D s platform
creates its own key-secret combinations in the Key Vault. No additional configuration is required.

Please skip this section and populate the Key Vault URL into the

D s platform
.

WASBFor WASB, you must create key and secret values that match other values in your Azure configuration. Instructions are below.

WASB: To enable access to the Key Vault, you must specify your key and secret values as follows:

ItemApplicable Configuration
key

The value of the key must be specified as the sasTokenId in the

D s platform
.

secretThe value of the secret should match the shared access signature for your storage.

Acquire shared access signature value:

In the Azure portal, please do the following:

  1. Open your storage account.
  2. Select Shared Access Signature.
  3. Generate or view existing signatures.
  4. For a new or existing signature, copy the SAS token value. Omit the leading question mark (?).
  5. Paste this value into a text file for safekeeping.

Create a custom key:

To create a custom key and secret pair for WASB use by the

D s platform
, please complete the following steps:

  1. On an existing or newly created Azure Key Vault resource, click Secrets.
  2. At the top of the menu, click Generate/Import.
  3. In the Create a secret menu:
    1. Select Manual for upload options.
    2. Chose an appropriate name for the key.

      Info

      NOTE: Please retain the name of the key for later use, when it is applied through the

      D s platform
      as the sasTokenId value. Instructions are provided later.

    3. Paste the SAS token value for the key into the secret field.
    4. Click Create.

Configure Key Vault location

For ADLS or WASB, the location of the Azure Key Vault must be specified for the 

D s platform
. The location can be found in the properties section of the Key Vault resource in the Azure portal.

Steps:

  1. Log in to the Azure portal.
  2. Select the Key Vault resource.
  3. Click Properties.
  4. Locate the DNS Name field. Copy the field value.

This value is the location for the Key Vault. It must be applied in the

D s platform
.

Steps:

  1. D s config
  2. Specify the URL in the following parameter:

    Code Block
    "azure.keyVaultURL": "<your key value URL>",

Apply SAS token identifier for WASB

If you are using WASB as your base storage layer, you must apply the SAS token value into the configuration of the

D s platform
.

Steps:

  1. D s config
  2. Paste the value of the SAS Token for the key you created in the Key Vault as the following value:

    Code Block
    "azure.wasb.defaultStore.sasTokenId": "<your Sas Token Id>",
  3. Save your changes.

Configure Secure Token Service

Access to the Key Vault requires use of the secure token service (STS) from the

D s platform
. To use STS with Azure, the following properties must be specified.

Info

NOTE: Except in rare cases, the other properties for secure token service do not need to be modified.

D s config

D s property overflow

 

PropertyDescription
"secure-token-service.autorestart"

Set this value to true to enable auto-restarting of the secure token service.

"secure-token-service.port"Set this value to 8090.
"com.trifacta.services.secure_token_service. \
refresh_token_encryption_key"

Enter a base64 string to serve as your encryption key for the refresh token of the secure token service.

Info

NOTE: If a valid base64 string value is not provided here, the platform fails to start.

For more information on how to generate an encryption key that is unique to your instance of the platform, see Install from Azure Marketplace.

"secure-token-service.userIdHashingPepper"Enter a base64 string.

 

Configure for SSO

If needed, you can integrate the

D s platform
with Azure AD for Single-Sign On to the platform. See Configure SSO for Azure AD.

Configure for ADLS

Enable read-only or read-write access to ADLS. For more information, see Enable ADLS Access.

Configure for WASB

Enable read-only or read-write access to WASB. For more information on integrating with WASB, see Enable WASB Access.

Configure relational connections

If you are integrating

D s product
with relational datastores, please complete the following configuration sections.

Create encryption key file

An encryption key file must be created on the

D s item
itemnode
. This key file is shared across all relational connections. See Create Encryption Key File.

Create Hive connection

You can create a connection to the Hive instance on the HDI cluster with some modifications.

Natively, Azure supports high availability for HiveServer2 via Zookeeper. As a result, host and port information in the JDBC URL must be replaced with a Zookeeper quorum.

In addition to the other Hive connection properties, please specify the following values for the properties listed below:

PropertyDescription
Host

Use your Zookeeper quorum value. For the final node of the list, omit the port number. Example:

Code Block
zk1.cloudapp.net:2181,zk2.cloudapp.net:2181,zk3.cloudapp.net
PortSet this value to 2181.
Connect String options

In addition to any options required for your environment, include the following option:

Code Block
/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
DatabaseEnter your Hive database name.

Connections are created through the Connections page. See Connections Page.

For additional details on creating a conection to Hive, see Create Hive Connections.

A Hive connection can also be created using the above property substitutions via CLI or API.

Create Azure SQL DB connection

For more information, see Create SQL DB Connections.

Create Azure SQL DW connection

For more information, see Create SQL DW Connections.

Workaround for missing Python packages

After installation, the supervisord process may complain about some Python packages that are "missing."

Info

NOTE: This issue applies to Microsoft Azure installs only. It will be addressed in a future release.

These packages are present but lack the appropriate permissions. To enable the packages for use, please run the following on the

D s item
itemnode
:

Code Block
python_dir="/usr/local/lib/python2.7"
directories=$(find "$python_dir/dist-packages/" -maxdepth 2 -type d)
for d in $directories; do
  chmod 775 "${d}"
  chmod ugo+r "${d}"/*
done

Testing

  1. Load a dataset from the HDI cluster through either ADLS or WASB.
  2. Perform a few simple steps on the dataset.
  3. Click Run Job in the Transformer page. 
  4. When specifying the job: 
    1. Click the Profile Results checkbox.
    2. Select Hadoop.
  5. When the job completes, verify that the results have been written to the appropriate location.