Contents:
This documentation applies to installation from a supported Marketplace. Please use the installation instructions provided with your deployment.
If you are installing or upgrading a Marketplace deployment, please use the available PDF content. You must use the install and configuration PDF available through the Marketplace listing.
Please complete the following steps in the listed order to configure your installed instance of the Designer Cloud Powered by Trifacta platform to integrate with an HDInsight cluster.
Pre-requisites
Deploy HDI cluster and Alteryx node.
NOTE: The HDI cluster can be deployed as part of installation from the Marketplace. You can also integrate the platform with a pre-existing cluster. Details are below.
- Install Designer Cloud Powered by Trifacta platform on the node.
For more information, see Install from Azure Marketplace. You must create a Azure Active Directory (AAD) application and grant it the desired access permissions, such as read/write access to the ADLS resource and read/write access to the Azure Key Vault secrets . This service principal is used by the Designer Cloud Powered by Trifacta platform for access to all Azure resources. For more information, see https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-create-service-principal-portal. After you have registered, acquire the following information: Acquire this value from the Registered app blade of the Azure Portal. Applied to Designer Cloud Powered by Trifacta platform configuration: Create a key for the Registered app in the Azure Portal. Applied to Designer Cloud Powered by Trifacta platform configuration: Applied to Designer Cloud Powered by Trifacta platform configuration: These properties are applied later in the configuration process. If you are integrating the Designer Cloud Powered by Trifacta platform with a pre-existing HDI cluster, additional configuration is required. See Configure for HDInsight. NOTE: If you created a new HDI cluster as part of the installation, all required is listed below. For Azure installations, you can set your base storage layer to be HDFS or WASB. NOTE: The base storage layer must be set after installation. After it has been configured, it cannot be modified. For authentication purposes, the Designer Cloud Powered by Trifacta platform must be integrated with an Azure Key Vault keystore. For more information, see https://azure.microsoft.com/en-us/services/key-vault/. Please complete the following sections to create and configure your Azure Key Vault. Name: Provide a reasonable name for the resource. Example: In the Azure portal, you must assign access policies for application principal of the Alteryx registered application to access the Key Vault. Steps: If you are enabling access to WASB, you must create this token within the Azure Portal. NOTE: Depending on the type of token you create (HTTP & HTTPS or HTTPS only), you must specify the storage protocol (WASB or WASBS) used by the Designer Cloud Powered by Trifacta platform. For more information, see https://docs.microsoft.com/en-us/rest/api/storageservices/delegating-access-with-a-shared-access-signature. In the Key Vault, you can create key and secret pairs for use. The Designer Cloud Powered by Trifacta platform creates its own key-secret combinations in the Key Vault. No additional configuration is required. Please skip this section and populate the Key Vault URL into the Designer Cloud Powered by Trifacta platform. WASB: To enable access to the Key Vault, you must specify your key and secret values as follows: The value of the key must be specified as the Acquire shared access signature value: In the Azure portal, please do the following: Create a custom key: To create a custom key and secret pair for WASB use by the Designer Cloud Powered by Trifacta platform, please complete the following steps: Chose an appropriate name for the key. NOTE: Please retain the name of the key for later use, when it is applied through the Designer Cloud Powered by Trifacta platform as the For ADLS or WASB, the location of the Azure Key Vault must be specified for the Designer Cloud Powered by Trifacta platform. The location can be found in the properties section of the Key Vault resource in the Azure portal. Steps: This value is the location for the Key Vault. It must be applied in the Designer Cloud Powered by Trifacta platform. Steps: Specify the URL in the following parameter: If you are using WASB as your base storage layer, you must apply the SAS token value into the configuration of the Designer Cloud Powered by Trifacta platform. Steps: Paste the value of the SAS Token for the key you created in the Key Vault as the following value: Access to the Key Vault requires use of the secure token service (STS) from the Designer Cloud Powered by Trifacta platform. To use STS with Azure, the following properties must be specified. NOTE: Except in rare cases, the other properties for secure token service do not need to be modified. You can apply this change through the Admin Settings Page (recommended) or Set this value to Enter a base64 string to serve as your encryption key for the refresh token of the secure token service. NOTE: If a valid base64 string value is not provided here, the platform fails to start. For more information on how to generate an encryption key that is unique to your instance of the platform, see Install from Azure Marketplace. If needed, you can integrate the Designer Cloud Powered by Trifacta platform with Azure AD for Single-Sign On to the platform. See Configure SSO for Azure AD. Enable read-only or read-write access to ADLS. For more information, see Enable ADLS Access. Enable read-only or read-write access to WASB. For more information on integrating with WASB, see Enable WASB Access. If you are integrating Designer Cloud Enterprise Edition with relational datastores, please complete the following configuration sections. An encryption key file must be created on the Alteryx node. This key file is shared across all relational connections. See Create Encryption Key File. You can create a connection to the Hive instance on the HDI cluster with some modifications. Natively, Azure supports high availability for HiveServer2 via Zookeeper. As a result, host and port information in the JDBC URL must be replaced with a Zookeeper quorum. In addition to the other Hive connection properties, please specify the following values for the properties listed below: Use your Zookeeper quorum value. For the final node of the list, omit the port number. Example: In addition to any options required for your environment, include the following option: Connections are created through the Connections page. See Connections Page. For additional details on creating a conection to Hive, see Create Hive Connections. A Hive connection can also be created using the above property substitutions via CLI or API. For more information, see Create SQL DB Connections. For more information, see Create SQL DW Connections. After installation, the supervisord process may complain about some Python packages that are "missing." NOTE: This issue applies to Microsoft Azure installs only. It will be addressed in a future release. These packages are present but lack the appropriate permissions. To enable the packages for use, please run the following on the Alteryx node:Configure Azure
Create registered application
Azure Property Location Use Application ID azure.applicationid
.Service User Key azure.secret
.Directory ID Copy the Directory ID from the Properties blade of Azure Active Directory. azure.directoryId
.Configure the Platform
Configure for HDI
Configure base storage layer
Azure storage webapp.storageProtocol setting hdfs.protocolOverride setting WASB wasbs
(recommended) or wasb
(empty) ADLS hdfs
adl
Configure for Key Vault
Create a Key Vault resource in Azure
<clusterName>-<applicationName>-<group/organizationName>
Enable Key Vault access for the Designer Cloud Powered by Trifacta platform
Create WASB access token
Configure Key Vault key and secret for WASB
Base Storage Layer Description ADLS WASB For WASB, you must create key and secret values that match other values in your Azure configuration. Instructions are below. Item Applicable Configuration key sasTokenId
in the Designer Cloud Powered by Trifacta platform.secret The value of the secret should match the shared access signature for your storage. sasTokenId
value. Instructions are provided later.Configure Key Vault location
trifacta-conf.json
.
For more information, see Platform Configuration Methods."azure.keyVaultURL": "<your key value URL>",
Apply SAS token identifier for WASB
trifacta-conf.json
.
For more information, see Platform Configuration Methods."azure.wasb.defaultStore.sasTokenId": "<your Sas Token Id>",
Configure Secure Token Service
trifacta-conf.json
.
For more information, see Platform Configuration Methods.\
. The backslash should not be included if the line is used as input.Property Description "secure-token-service.autorestart" true
to enable auto-restarting of the secure token service."secure-token-service.port" Set this value to 8090
."com.trifacta.services.secure_token_service. \
refresh_token_encryption_key""secure-token-service.userIdHashingPepper" Enter a base64 string. Configure for SSO
Configure for ADLS
Configure for WASB
Configure relational connections
Create encryption key file
Create Hive connection
Property Description Host zk1.cloudapp.net:2181,zk2.cloudapp.net:2181,zk3.cloudapp.net
Port Set this value to 2181
.Connect String options /;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Database Enter your Hive database name. Create Azure SQL DB connection
Create Azure SQL DW connection
Workaround for missing Python packages
python_dir="/usr/local/lib/python2.7"
directories=$(find "$python_dir/dist-packages/" -maxdepth 2 -type d)
for d in $directories; do
chmod 775 "${d}"
chmod ugo+r "${d}"/*
done
Testing
- Load a dataset from the HDI cluster through either ADLS or WASB.
- Perform a few simple steps on the dataset.
- Click Run Job in the Transformer page.
- When specifying the job:
- Click the Profile Results checkbox.
- Select Hadoop.
- When the job completes, verify that the results have been written to the appropriate location.
This page has no comments.