Contents:
By default, Microsoft Azure deployments integrate with Azure Data Lake Store (ADLS). Optionally, you can configure your deployment to integrate with WASB.
- Windows Azure Storage Blob (WASB) is an abstraction layer on top of HDFS, which enables persistence of storage, access without a Hadoop cluster presence, and access from multiple Hadoop clusters.
Limitations of WASB Integration
- If a directory is created on the HDI cluster through WASB, the directory includes a Size=0 blob. The Designer Cloud Powered by Trifacta platform does not list them and does not support interaction with Size=0 blobs.
Read-only access
If the base storage layer has been set to ADLS Gen1 or ADLS Gen2, you can follow these instructions to set up read-only access to WASB.
NOTE: If you are adding WASB as a secondary integration, your WASB blob container or containers must contain at least one folder. This is a known issue.
NOTE: To enable read-only access to WASB, do not set the base storage layer to wasbs
. The base storage layer must remain set for ADLS Gen 1 or ADLS Gen 2.
Pre-requisites
General
- The Designer Cloud Powered by Trifacta platform has already been installed and integrated with an Azure HDI or Azure Databricks cluster.
- WASB must be set as the base storage layer for the Designer Cloud Powered by Trifacta platform instance. See Set Base Storage Layer.
- For each combination of blob host and container, a separate Azure Key Vault Store entry must be created. For more information, please contact your Azure admin.
Create a registered application
Before you integrate with Azure WASB, you must create the Designer Cloud Powered by Trifacta platform as a registered application. See Configure for Azure.
Other Azure properties
The following properties should already be specified in the Admin Settings page. Please verify that the following have been set:
-
azure.applicationId
-
azure.secret
-
azure.directoryId
The above properties are needed for this configuration. For more information, see Configure for Azure.
Key Vault Setup
For new installs, an Azure Key Vault has already been set up and configured for use by the Designer Cloud Powered by Trifacta platform.
NOTE: An Azure Key Vault is required. Upgrading customers who do not have a Key Vault in their environment must create one.
For more information, see Configure for Azure.
Configure WASB Authentication
Authentication to WASB storage is managed by specifying the appropriate host, container, and token ID in the Designer Cloud Powered by Trifacta platform configuration. When access to WASB is requested, the platform passes the information through the Secure Token Service to query the specified Azure Key Vault Store using the provided values. The keystore returns the value for the secret. The combination of the key (token ID) and secret is used to access WASB.
NOTE: Per-user authentication is not supported for WASB.
For more information on creating the Key Vault Store and accessing it through the Secure Token Service, see Configure for Azure.
Configure the Designer Cloud Powered by Trifacta platform
Define location of SAS token
The SAS token required for accessing Azure can be accessed from either of the following locations:
- Key Vault
- Alteryx configuration
SAS token from Key Vault
To store the SAS token in the key vault, specify the following parameters in platform configuration. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json
.
For more information, see Platform Configuration Methods.
Parameter | Description |
---|---|
"azure.wasb.fetchSasTokensFromKeyVault": true, | Instructs the Designer Cloud Powered by Trifacta platform to query the Key Vault for SAS tokens NOTE: The Key Vault must already be set up. See "Key Vault Setup" above. |
SAS token from Alteryx configuration
To specify the SAS token in the Designer Cloud Powered by Trifacta platform configuration, set the following flag to false
:
Parameter | Description |
---|---|
"azure.wasb.fetchSasTokensFromKeyVault": false, | Instructs the Designer Cloud Powered by Trifacta platform to acquire per-container SAS tokens from the platform configuration. |
Define WASB stores
The WASB stores that users can access are specified as an array of configuration values. Users of the platform can use all of them for reading sources and writing results.
Steps:
- To apply this configuration change, login as an administrator to the Alteryx node. Then, edit
trifacta-conf.json
. For more information, see Platform Configuration Methods. Locate the
azure.wasb.stores
configuration block.Apply the appropriate configuration as specified below.
Tip: The default container must be specified as the first set of elements in the array. All containers listed after the first one are treated as extra stores.
"azure.wasb.stores": [ { "sasToken": "<DEFAULT_VALUE1_HERE>", "keyVaultSasTokenSecretName": "<DEFAULT_VALUE1_HERE>", "container": "<DEFAULT_VALUE1_HERE>", "blobHost": "<DEFAULT_VALUE1_HERE>" }, { "sasToken": "<VALUE2_HERE>", "keyVaultSasTokenSecretName": "<VALUE2_HERE>", "container": "<VALUE2_HERE>", "blobHost": "<VALUE2_HERE>" } ] },
Parameter Description SAS Token from Azure Key Vault SAS Token from Platform Configuration sasToken
Set this value to the SAS token to use, if applicable.
Example value:
?sv=2019-02-02&ss=bfqt&srt=sco&sp=rwdlacup&se=2022-02-13T00:00:00Z&st=2020-02-13T00:00:00Z&spr=https&sig=<redacted>
Set this value to an empty string.
NOTE: Do not delete the entire line. Leave the value as empty.
See below for the command to execute to generate a SAS token. keyVaultSasTokenSecretName
Set this value to the secret name of the SAS token in the Azure key vault to use for the specified blob host and container.
If needed, you can generate and apply a per-container SAS token for use in this field for this specific store. Details are below.
Set this value to an empty string.
NOTE: Do not delete the entire line. Leave the value as empty.
container
Apply the name of the WASB container.
NOTE: If you are specifying different blob host and container combinations for your extra stores, you must create a new Key Vault store. See above for details.
blobHost
Specify the blob host of the container.
Example value:
storage-account.blob.core.windows.net
NOTE: If you are specifying different blob host and container combinations for your extra stores, you must create a new Key Vault store. See above for details.
- Save your changes and restart the platform.
Generate per-container SAS token
Execute the appropriate command at the command line to generate a SAS token for a specific container. The following Windows PowerShell command generates a SAS token that is valid for a full year:
Set-AzureRmStorageAccount -Name 'name' $sasToken = New-AzureStorageContainerSASToken -Permission r -ExpiryTime (Get-Date).AddYears(1) -Name '<container_name>'
Tip: You can also generate a Shared Access Signature token for your Storage Account and Container from the Azure Portal.
Configure storage protocol
You must configure the platform to use the WASBS (secure) storage protocol when accessing.
Steps:
- You can apply this change through the Admin Settings Page (recommended) or
trifacta-conf.json
. For more information, see Platform Configuration Methods. Locate the following parameter and change its value
wasbs
for secure access:"webapp.storageProtocol": "wasbs",
Set the following:
"hdfs.enabled": false,
- Save your changes and restart the platform.
Define storage locations
You must define the base blob locations and supported protocol for storing data on WASB.
Steps:
- You can apply this change through the Admin Settings Page (recommended) or
trifacta-conf.json
. For more information, see Platform Configuration Methods. Locate the following configuration block. Specify the listed changes:
"fileStorage": { "defaultBaseUris": [ "<baseURIOfYourBlob>" ], "whitelist": ["wasbs"] }
Parameter Description defaultBaseUris
A comma-separated list of protocols that are permitted to read and write with WASB storage.
NOTE: The
wasbs://
protocol identifier must be included. WASB protocol is not supported.Example value:
wasbs://container@storage-account.blob.core.windows.net/
whitelist
For each supported protocol, this array must contain a top-level path to the location where Designer Cloud Powered by Trifacta platform files can be stored. These files include uploads, samples, and temporary storage used during job execution.
NOTE: This array of values must include
wasbs
.- Save your changes and restart the platform.
Testing
After the configuration has been specified, a WASB connection appears in the Import Data page. Select it to begin navigating through the WASB Browser for data sources.
Try running a simple job from the Designer Cloud application. For more information, see Verify Operations.
- See WASB Browser.
- See Using WASB.
Troubleshooting
For additional troubleshooting information, see Enable ADLS Gen2 Access.
This page has no comments.