Contents:
After installation of the Designer Cloud powered by Trifacta platform software and databases in your Microsoft Azure infrastructure, please complete these steps to perform the basic integration between the Trifacta node and Azure resources like the backend storage layer and running environment cluster.
NOTE: This section includes only basic configuration for required platform functions and integrations with Azure. Please use the links in this section to access additional details on these key features.
Tip: When you save changes from within the Designer Cloud powered by Trifacta platform , your configuration is automatically validated, and the platform is automatically restarted.
Configure in Azure
These steps require admin access to your Azure deployment.
Create registered application
To create an Azure Active Directory (AAD) application, please complete the following steps in the Azure console. Steps: Create registered application: In the Azure console, navigate to Azure Active Directory > App Registrations. Create a New App. Name it NOTE: Retain the Application ID and Directory ID for configuration in the Designer Cloud powered by Trifacta platform
. Create a new Client secret. NOTE: Retain the value of the Client secret for configuration in the Designer Cloud powered by Trifacta platform
.trifacta
.user_impersonation
permission.
For additional details, see Configure for Azure.
Please complete the following steps in the Azure portal to create a Key Vault and to associate it with the Trifacta registered application. NOTE: A Key Vault is required for use with the Designer Cloud powered by Trifacta platform
. Steps: Name: Provide a reasonable name for the resource. Example: Or, you can use To create the resource, click Create. NOTE: Retain the DNS Name value for later use. Steps: In the Azure portal, you must assign access policies for application principal of the Trifacta registered application to access the Key Vault. Steps:Create Key Vault in Azure
<clusterName>-<applicationName>-<group/organizationName>
trifacta
.Enable Key Vault access for the Designer Cloud powered by Trifacta platform
For additional details, see Configure Azure Key Vault.
Create or modify Azure backend datastore
In the Azure console, you must create or modify the backend datastore for use with the Designer Cloud powered by Trifacta platform . Supported datastores:
NOTE: You should review the limitations for your selected datastore before configuring the platform to use it. After the base storage layer has been defined in the platform, it cannot be modified.
Datastore | Notes |
---|---|
ADLS Gen2 | Supported for use with Azure Databricks cluster only. |
ADLS Gen1 | See Enable ADLS Gen1 Access. |
WASB | Only WASBS protocol is supported only. See Enable WASB Access. |
Create or modify running environment cluster
In the Azure console, you must create or modify the running environment where jobs are executed by the Designer Cloud powered by Trifacta platform . Supported running environments:
NOTE: You should review the limitations for your selected running environment before configuring the platform to use it.
Running Environment | Notes |
---|---|
Azure Databricks | |
HDI | See Configure for HDInsight. |
Configure the Platform
Please complete the following sections as soon as you can access the Designer Cloud application .
As part of the install process, an admin user account is created. NOTE: Some platform functions cannot be executed without an admin account. Your deployment should always have an admin account. After the Trifacta software has been installed, the administrator of the system should immediately change the password for the admin account through the Designer Cloud application
. If you do not know the admin account credentials, please contact Alteryx Support. Steps: By default, any visitor to the Login page can create an account in the Designer Cloud powered by Trifacta platform
. If the Designer Cloud powered by Trifacta platform
is available on the public Internet or is otherwise vulnerable to unauthorized access, unauthorized users can register and use the product. If this level of access is unsatisfactory, you should disable self-registration. Disabling self-registration means that a Trifacta administrator must enable all users. For more information, see Configure User Self-Registration. To manage cookie signing, the platform deploys a shared secret, which is used for guaranteeing data transfer between the web client and the platform. At install time, the platform inserts a default shared secret. The default 64-character shared secret for the platform is the same for all instances of the platform of the same version. This secret should not be used across multiple deployments of the platform. NOTE: If your instance of the platform is available on the public Internet or if you have deployed multiple instances of the same release of the platform, cookies can become insecure across instances when the secret is shared across instances. You should get in the habit of changing this value for each installation of the platform. Please complete the following steps to change the shared secret. Steps: Locate the following parameter:Change admin password
Review self-registration
Configure shared secret
trifacta-conf.json
.
For more information, see Platform Configuration Methods."sharedSecret": <64_character_value>
Configure the Platform for Azure
Please complete the following steps to configure the Designer Cloud powered by Trifacta platform and to integrate it with Azure resources.
Base platform configuration
Please complete the following configuration steps in the Designer Cloud powered by Trifacta® platform
. NOTE: If you are integrating with Azure Databricks and are Managed Identities for authentication, please skip this section. That configuration is covered in a later step. NOTE: Except as noted, these configuration steps are required for all Azure installs. These values must be extracted from the Azure portal. Steps: Azure registered application values: Application ID for the Trifacta registered application that you created in the Azure console The directory ID for the Trifacta registered application The Secret value for the Trifacta registered application Configure Key Vault:trifacta-conf.json
.
For more information, see Platform Configuration Methods."azure.applicationId": "<azure_application_id>",
"azure.directoryId": "<azure_directory_id>",
"azure.secret": "<azure_secret>",
Parameter Description azure.applicationId azure.directoryId azure.secret "azure.keyVaultUrl": "<url_of_key_vault>",
Parameter Description azure.keyVaultUrl URL of the Azure Key Vault that you created in the Azure console
For additional details:
Set base storage layer
The Designer Cloud powered by Trifacta platform supports integration with the following backend datastores on Azure.
- ADLS Gen2
- ADLS Gen1
- WASB
ADLS Gen2
Please complete the following configuration steps in the Designer Cloud powered by Trifacta® platform .NOTE: Integration with ADLS Gen2 is supported only on Azure Databricks.
Steps:
- You can apply this change through the Admin Settings Page (recommended) or
trifacta-conf.json
. For more information, see Platform Configuration Methods. Enable ADLS Gen2 as the base storage layer:
"webapp.storageProtocol": "abfss", "hdfs.enabled": false, "hdfs.protocolOverride": "",
Parameter Description webapp.storageProtocol Sets the base storage layer for the platform. Set this value to
abfss
.NOTE: After this parameter has been saved, you cannot modify it. You must re-install the platform to change it.
hdfs.enabled For ADLS Gen2 access, set this value to false
.hdfs.protocolOverride For ADLS Gen2 access, this special parameter should be empty. It is ignored when the storage protocol is set to abfss
.Configure ADLS Gen2 access mode. The following parameter must be set to
system
."azure.adlsgen2.mode": "system",
Set the protocol whitelist and base URIs for ADLS Gen2:
"fileStorage.whitelist": ["abfss"], "fileStorage.defaultBaseUris": ["abfss://filesystem@storageaccount.dfs.core.windows.net/"],
Parameter Description fileStorage.whitelist A comma-separated list of protocols that are permitted to read and write with ADLS Gen2 storage.
NOTE: The protocol identifier
"abfss"
must be included in this list.fileStorage.defaultBaseUris For each supported protocol, this param must contain a top-level path to the location where Designer Cloud powered by Trifacta platform files can be stored. These files include uploads, samples, and temporary storage used during job execution.
NOTE: A separate base URI is required for each supported protocol. You may only have one base URI for each protocol.
- Save your changes.
- The Java VFS service must be enabled for ADLS Gen2 access. For more information, see Configure Java VFS Service in the Configuration Guide.
For additional details, see Enable ADLS Gen2 Access.
ADLS Gen1
ADLS Gen1 access leverages HDFS protocol and storage, so additional configuration is required. Steps: Enable ADLS Gen1 as the base storage layer: Sets the base storage layer for the platform. Set this value to NOTE: After this parameter has been saved, you cannot modify it. You must re-install the platform to change it. These parameters specify the base location and protocol for storage. Only one datastore can be specified: Set this value to the base location for your ADLS Gen1 storage area. Example: This list must include Save your changes.trifacta-conf.json
.
For more information, see Platform Configuration Methods."webapp.storageProtocol": "adl",
"hdfs.enabled": false,
Parameter Description webapp.storageProtocol adl
.hdfs.enabled For ADLS Gen1 storage, set this value to false
."fileStorage": {
"defaultBaseUris": [
"<baseURIOfYourLocation>"
],
"whitelist": ["adl"]
}
Parameter Description filestorage.defaultBaseURIs adl://<YOUR_STORE_NAME>.azuredatalakestore.net
whitelist adl
.
For additional details, see Enable ADLS Gen1 Access.
WASB
Steps: Enable WASB as the base storage layer: Sets the base storage layer for the platform. Set this value to NOTE: After this parameter has been saved, you cannot modify it. You must re-install the platform to change it. When integrating with WASB, the platform must be configured to use a SAS token to gain access to WASB resources. This token can be made available in either of the following ways, each of which requires separate configuration. Via Designer Cloud powered by Trifacta platform
configuration: Locate and specify the following parameter: Via Azure Key Vault: To require the Designer Cloud powered by Trifacta platform
to acquire the SAS token from the Azure key vault, please complete the following configuration steps. Locate and specify the following parameter: Locate the Apply the appropriate configuration as specified below. Tip: The default container must be specified as the first set of elements in the array. All containers listed after the first one are treated as extra stores. Set this value to the SAS token to use, if applicable. Example value: Set this value to an empty string. NOTE: Do not delete the entire line. Leave the value as empty. Set this value to the secret name of the SAS token in the Azure key vault to use for the specified blob host and container. If needed, you can generate and apply a per-container SAS token for use in this field for this specific store. Details are below. Set this value to an empty string. NOTE: Do not delete the entire line. Leave the value as empty. Apply the name of the WASB container. NOTE: If you are specifying different blob host and container combinations for your extra stores, you must create a new Key Vault store. See above for details. Specify the blob host of the container. Example value: NOTE: If you are specifying different blob host and container combinations for your extra stores, you must create a new Key Vault store. See above for details.trifacta-conf.json
.
For more information, see Platform Configuration Methods."webapp.storageProtocol": "wasbs",
"hdfs.enabled": false,
Parameter Description webapp.storageProtocol wasbs
.wasb
protocol is not supported.hdfs.enabled For WASB blob storage, set this value to false
.Configure SAS token for WASB
trifacta-conf.json
.
For more information, see Platform Configuration Methods."azure.wasb.fetchSasTokensFromKeyVault": false,
Parameter Description azure.wasb.fetchSasTokensFromKeyVault For acquiring the SAS token from platform configuration, set this value to false
.trifacta-conf.json
.
For more information, see Platform Configuration Methods."azure.wasb.fetchSasTokensFromKeyVault": true,
Parameter Description azure.wasb.fetchSasTokensFromKeyVault For acquiring the SAS token from the key vault, set this value to true
.Define WASB stores
trifacta-conf.json
.
For more information, see Platform Configuration Methods.azure.wasb.stores
configuration block."azure.wasb.stores":
[
{
"sasToken": "<DEFAULT_VALUE1_HERE>",
"keyVaultSasTokenSecretName": "<DEFAULT_VALUE1_HERE>",
"container": "<DEFAULT_VALUE1_HERE>",
"blobHost": "<DEFAULT_VALUE1_HERE>"
},
{
"sasToken": "<VALUE2_HERE>",
"keyVaultSasTokenSecretName": "<VALUE2_HERE>",
"container": "<VALUE2_HERE>",
"blobHost": "<VALUE2_HERE>"
}
]
},
Parameter Description SAS Token from Azure Key Vault SAS Token from Platform Configuration sasToken
?sv=2019-02-02&ss=bfqt&srt=sco&sp=rwdlacup&se=2022-02-13T00:00:00Z&st=2020-02-13T00:00:00Z&spr=https&sig=<redacted>
See below for the command to execute to generate a SAS token. keyVaultSasTokenSecretName
container
blobHost
storage-account.blob.core.windows.net
For additional details, see Enable WASB Access.
Checkpoint: At this point, you should be able to load data from your backend datastore, if data is available. You can try to run a small job on Photon, which is native to the Trifacta node. You cannot yet run jobs on an integrated cluster.
Integrate with running environment
The Designer Cloud powered by Trifacta platform can run jobs on the following running environments.
NOTE: You may integrate with only one of these environments.
Base configuration for Azure running environments
The following parameters should be configured for all Azure running environments. Steps: Parameters: When set to Tip: Unless otherwise instructed, the Photon running environment should be enabled.trifacta-conf.json
.
For more information, see Platform Configuration Methods."webapp.runInTrifactaServer": true,
"webapp.runinEMR": false,
"webapp.runInDataflow": false,
Parameter Description webapp.runInTrifactaServer true
, the platform recommends and can run smaller jobs on the Trifacta node, which uses the embedded Photon running environment.webapp.runinEMR For Azure, set this value to false
.webapp.runInDataflow For Azure, set this value to false
.
Azure Databricks
The Designer Cloud powered by Trifacta platform
can be configured to integrate with supported versions of Azure Databricks clusters to run jobs in Spark. NOTE: Before you attempt to integrate, you should review the limitations around this integration. For more information, see Configure for Azure Databricks. Steps: You can apply this change through the Admin Settings Page (recommended) or Configure the following parameters to enable job execution on the specified Azure Databricks cluster: Defines if the platform runs jobs in Azure Databricks. Set this value to Configure the following Azure Databricks-specific parameters: NOTE: If you are using instance pooling on the cluster, additional configuration is required. See Configure for Azure Databricks.trifacta-conf.json
.
For more information, see Platform Configuration Methods."webapp.runInDatabricks": true,
"webapp.runWithSparkSubmit": false,
Parameter Description webapp.runInDatabricks true
.webapp.runWithSparkSubmit For all Azure Databricks deployments, this value should be set to false
."databricks.serviceUrl": "<url_to_databricks_service>",
Parameter Description databricks.serviceUrl URL to the Azure Databricks Service where Spark jobs will be run (Example: https://westus2.azuredatabricks.net)
For additional details, see Configure for Azure Databricks.
HDInsight
The Designer Cloud powered by Trifacta platform
can be configured to integrate with supported versions of HDInsight clusters to run jobs in Spark. NOTE: Before you attempt to integrate, you should review the limitations around this integration. For more information, see Configure for HDInsight. Specify running environment options: You can apply this change through the Admin Settings Page (recommended) or Configure the following parameters to enable job execution on the specified HDI cluster: Defines if the platform runs jobs in Azure Databricks. Set this value to Specify Trifacta user: Set the Hadoop username for the Designer Cloud powered by Trifacta platform
to use for executing jobs Specify location of client distribution bundle JAR: The Designer Cloud powered by Trifacta platform
ships with client bundles supporting a number of major Hadoop distributions. You must configure the jarfile for the distribution to use. These distributions are stored in the following directory: Configure the bundle distribution property ( Configure component settings: For each of the following components, please explicitly set the following settings. You can apply this change through the Admin Settings Page (recommended) or Configure Batch Job Runner: Configure the following environment variables: Configure the following properties for various Trifacta components: Disable S3 access: Configure the following Spark Job Service properties:trifacta-conf.json
.
For more information, see Platform Configuration Methods."webapp.runInDatabricks": false,
"webapp.runWithSparkSubmit": true,
Parameter Description webapp.runInDatabricks false
.webapp.runWithSparkSubmit For HDI deployments, this value should be set to true
.[hadoop.user
(default=trifacta
)]
: "hdfs.username": "[hadoop.user]",
/trifacta/hadoop-deps
hadoopBundleJar
): "hadoopBundleJar": "hadoop-deps/hdp-2.6/build/libs/hdp-2.6-bundle.jar"
trifacta-conf.json
.
For more information, see Platform Configuration Methods. "batch-job-runner": {
"autoRestart": true,
...
"classpath": "%(topOfTree)s/services/batch-job-runner/build/install/batch-job-runner/batch-job-runner.jar:%(topOfTree)s/services/batch-job-runner/build/install/batch-job-runner/lib/*:%(topOfTree)s/conf/hadoop-site:%(topOfTree)s/%(hadoopBundleJar)"
},
"env.PATH": "${HOME}/bin:$PATH:/usr/local/bin:/usr/lib/zookeeper/bin",
"env.TRIFACTA_CONF": "/opt/trifacta/conf"
"env.JAVA_HOME": "/usr/lib/jvm/java-1.8.0-openjdk-amd64",
"ml-service": {
"autoRestart": true
},
"monitor": {
"autoRestart": true,
...
"port": <your_cluster_monitor_port>
},
"proxy": {
"autoRestart": true
},
"udf-service": {
"autoRestart": true
},
"webapp": {
"autoRestart": true
},
"aws.s3.enabled": false,
"spark-job-service.classpath": "%(topOfTree)s/services/spark-job-server/server/build/install/server/lib/*:%(topOfTree)s/conf/hadoop-site/:%(sparkBundleJar)s:%(topOfTree)s/%(hadoopBundleJar)s",
"spark-job-service.env.SPARK_DIST_CLASSPATH": "/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-mapreduce-client/*",
For additional details, see Configure for HDInsight.
Checkpoint: At this point, you should be able to load data from your backend datastore and run jobs on an integrated cluster.
Configure platform authentication
The Designer Cloud powered by Trifacta platform supports the following methods of authentication when hosted in Azure.
Integrate with Azure AD SSO
The platform can be configured to integrate with your enterprise's Azure Active Directory provider. For more information, see Configure SSO for Azure AD.
Non-SSO authentication
If you are not applying your enterprise SSO authentication to the Designer Cloud powered by Trifacta platform , platform users must be created and managed through the application.
Self-managed:
Users can be permitted to self-register their accounts and manage their password reset requests:
NOTE: Self-created accounts are permitted to import data, generate samples, run jobs, and generate and download results. Admin roles must be assigned manually through the application.
- See Configure User Self-Registration in the Configuration Guide
- See Enable Self-Service Password Reset in the Configuration Guide
Admin-managed:
If users are not permitted to create their accounts, an admin must do so:
- See Create User Account in the Admin Guide
- See Create Admin Account in the Admin Guide
Checkpoint: Users who are authenticated or have been provisioned user accounts should be able to login to the Designer Cloud application and begin using the product.
Verify Operations
NOTE: You can try to verify operations using the Trifacta Photon running environment at this time.
To complete this test, you should locate or create a simple dataset. Your dataset should be created in the format that you wish to test. Tip: The simplest way to test is to create a two-column CSV file with at least 25 non-empty rows of data. This data can be uploaded through the application. Characteristics: If you are testing an integration, you should store your dataset in the datastore with which the product is integrated. Tip: Uploading datasets is always available as a means of importing datasets. Steps: Login to the application.See Login. Click Import and Add to Flow. If options are presented, select the defaults. See Run Job Page. Checkpoint: You have verified importing from the selected datastore and transforming a dataset. If your job was successfully executed, you have verified that the product is connected to the job running environment and can write results to the defined output location. Optionally, you may have tested profiling of job results. If all of the above tasks completed, the product is operational end-to-end.Prepare Your Sample Dataset
Store Your Dataset
Verification Steps
Documentation
You can access complete product documentation online and in PDF format. From within the product, select Help menu > Documentation.
Next Steps
The following install and configuration topics were not covered in this workflow. If these features apply, please reference the following topics in the Configuration Guide for more information.
Topic | Description | Configuration Guide sections |
---|---|---|
User Access | You can enable self-service user registration or create users through the admin account. | Required Platform Configuration |
Relational Connections | The platform can integrate with a variety of relational datastores. | |
Compressed Clusters | The platform can integrate with some compressed running environments. | Enable Integration with Compressed Clusters |
High Availability | The platform can integrate with a highly available cluster. | Enable Integration with Cluster High Availability |
The Trifacta node can be configured to use other nodes in case of a failure. | Configure for High Availability | |
Features | Some features must be enabled and can be configured through platform configuration. | Feature flags: Miscellaneous Configuration |
Services | Some platform services support additional configuration options. | Configure Services |
This page has no comments.