Contents:
To install Designer Cloud Enterprise Edition inside your enterprise infrastructure, please review and complete the following sections in the order listed below.
Scenario Description
- Installation of Designer Cloud Enterprise Edition on a server on-premises
- Installation of Alteryx databases on a server on-premises
- Integration with a supported Hadoop cluster on premises.
- Base storage layer of HDFS
Preparation
- Review Planning Guide: Please review and verify Install Preparation and sub-topics.
- Acquire Assets: Acquire the installation package for your operating system and your license key. For more information, contact Alteryx Support.
- If you are completing the installation without Internet access, you must also acquire the offline versions of the system dependencies. See Install Dependencies without Internet Access.
Deploy Hadoop cluster: In this scenario, the Designer Cloud Powered by Trifacta platform does not create a Hadoop cluster. See below.
NOTE: Installation and maintenance of a working Hadoop cluster is the responsibility of the Designer Cloud Enterprise Edition customer. Guidance is provided below on the requirements for integrating the platform with the cluster.
- Deploy Alteryx node: Designer Cloud Enterprise Edition must be installed on an edge node of the cluster. Details are below.
Limitations: For more information on limitations of this scenario, see Product Limitations in the Install Preparation area.
Deploy the Cluster
In your enterprise infrastructure, you must deploy a cluster using a supported version of Hadoop to manage the expected data volumes of your Alteryx jobs. For more information on suggested sizing, see Sizing Guidelines in the Install Preparation area.
When you configure the platform to integrate with the cluster, you must acquire information about the cluster configuration. For more information on the set of information to collect, see Pre-Install Checklist in the Install Preparation area.
NOTE: By default, smaller jobs are executed on the Trifacta Photon running environment . Larger jobs are executed using Spark on the integrated Hadoop cluster. Spark must be installed on the cluster. For more information, see System Requirements in the Install Preparation area.
The Designer Cloud Powered by Trifacta platform supports integration with the following cluster types. For more information on the supported versions, please see the listed sections below.
Prepare the cluster
Before installing software, please complete the following steps if you are integrating with a Hadoop cluster. Before you begin, please verify or complete the following: Change the ownership of NOTE: You must verify that the Verify that WebHDFS is configured and running on the cluster. For more information, see Prepare Hadoop for Integration with the Platform. [hadoop.user
(default=trifacta
)]
and a group for it [hadoop.group
(default=trifactausers
)]
./trifacta
/user/trifacta
/trifacta
and /user/trifacta
to trifacta:trifacta
or the corresponding values for the Hadoop user in your environment.[hadoop.user]
user has complete ownership and full access to Read, Write and Execute on these directories recursively.
Additional users may be required. For more information, see Required Users and Groups in the Install Preparation area.
Deploy the Alteryx node
An edge node of the cluster is required to host the Designer Cloud Powered by Trifacta platform software. For more information on the requirements of this node, see System Requirements.
Install Workflow
Please complete these steps listed in order:
Install software: Install the Designer Cloud Powered by Trifacta platform software on the cluster edge node. See Install Software.
Install databases: The platform requires several databases for storage.
NOTE: The default configuration assumes that you are installing the databases on a PostgreSQL server on the same edge node as the software using the default ports. If you are changing the default configuration, additional configuration is required as part of this installation process.
For more information, see Install Databases.
- Start the platform: For more information, see Start and Stop the Platform.
- Login to the application: After software and databases are installed, you can login to the application to complete configuration:
- See Login.
As soon as you login, you should change the password on the admin account. In the left menu bar, select Settings > Admin Settings. Scroll down to Manage Users. For more information, see Change Admin Password.
Tip: At this point, you can access the online documentation through the application. In the left menu bar, select Help menu > Product Docs. All of the following content, plus updates, is available online. See Documentation below.
Configure for Hadoop
After you have performed the base installation of the Designer Cloud Powered by Trifacta® platform, please complete the following steps if you are integrating with a Hadoop cluster. If the Designer Cloud Powered by Trifacta platform is being installed on a non-edge node, you must copy over the Hadoop Client Configuration files from the cluster. NOTE: When these files change, you must update the local copies. For this reason, it is best to install on an edge node. These configuration files must be moved to the Alteryx deployment. By default, these files are in For more information, see Configure for Hadoop. If the Designer Cloud Powered by Trifacta platform is being installed on an edge node of the cluster, you can create a symlink from a local directory to the source cluster files so that they are automatically updated as needed. Navigate to the following directory on the Alteryx node: Create a symlink for each of the Hadoop Client Configuration files referenced in the previous steps. Example: For more information, see Configure for Hadoop. To apply this configuration change, login as an administrator to the Alteryx node. Then, edit HDFS: Change the host and port information for HDFS as needed. Please apply the port numbers for your distribution: For more information, see Configure for Hadoop. Save your changes and restart the platform. The Spark Job Service must be enabled for both execution and profiling jobs to work in Spark. NOTE: Beginning in Release 4.0, the Spark Job Service and running environment are enabled by default. If you are upgrading from an earlier release, you may be required to enable the service through the following configuration changes. Below is a sample configuration and description of each property. You can apply this change through the Admin Settings Page (recommended) or The following properties can be modified based on your needs: NOTE: Unless explicitly told to do so, do not modify any of the above properties that are not listed below. This array of values can be used to pass parameters to the JVM that manages Spark Job Service. If Spark is integrated with a Hive instance, set this value to the path to the location where Hive dependencies are installed on the Alteryx node. For more information, see Configure for Hive. The username of the Hadoop principal used by the platform. By default, this value is The directory on the Alteryx node where the Hadoop cluster configuration files are stored. Do not modify unless necessary. See below. After making any changes, save the file and restart the platform. See Start and Stop the Platform. Depending on the environment, please apply the following configuration changes to manage Spark interactions with Hive: If Hive is present on the cluster and either enabled or disabled: the At this point, the platform only expects that a After the Spark Job Service has been enabled, please complete the following sections to configure it for the Designer Cloud Powered by Trifacta platform. All jobs submitted to the Spark Job Service are executed in YARN cluster mode. No other cluster mode is supported for the Spark Job Service. The Spark Job Service can run under secure impersonation. For more information, see Configure for Secure Impersonation. When running under secure impersonation, the Spark Job Service requires access to the following folders. Read, write, and execute access must be provided to the Alteryx user and the impersonated user. Maintains JAR files and other libraries required by Spark. No sensitive information is written to this location. Contains definitions of dictionaries created for the platform. The Spark Job Service does not require additional installation on the Alteryx node or on the Hadoop cluster. Instead, it references the spark-assembly JAR that is provided with the Alteryx distribution. This JAR file does not include the Hadoop client libraries. You must point the Designer Cloud Powered by Trifacta platform to the appropriate libraries. Steps: Set the following property: For Hortonworks 2.x: This property configuration is covered later in this section. If the Designer Cloud Powered by Trifacta platform is also connected to a Hive instance, please verify the location of the Hive dependencies on the Alteryx node. The following example is from Cloudera 5.10: NOTE: This parameter value is distribution-specific. Please update based on your Hadoop distribution. For more information, see Configure for Spark. NOTE: If high availability is enabled on the Hadoop cluster, it must be enabled on the Designer Cloud Powered by Trifacta platform, even if you are not planning to rely on it. See Enable Integration with Cluster High Availability.Apply cluster configuration files - non-edge node
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
hive-site.xml
(if you are using Hive)/etc/hadoop/conf
:sudo cp <location>/*.xml /opt/trifacta/conf/hadoop-site/
sudo chown trifacta:trifacta /opt/trifacta/conf/hadoop-site/*.xml
Apply cluster configuration files - edge node
cd /opt/trifacta/conf/hadoop-site
ln -s /etc/hadoop/conf/core-site.xml core-site.xml
Modify Alteryx configuration changes
trifacta-conf.json
.
For more information, see Platform Configuration Methods."hdfs.namenode.host": "<namenode>",
"hdfs.namenode.port": <hdfs_port_num>
"hdfs.yarn.resourcemanager": {
"hdfs.yarn.webappPort": 8088,
"hdfs.yarn.adminPort": 8033,
"hdfs.yarn.host": "<resourcemanager_host>",
"hdfs.yarn.port": <resourcemanager_port>,
"hdfs.yarn.schedulerPort": 8030
Configure Spark Job Service
trifacta-conf.json
.
For more information, see Platform Configuration Methods.."spark-job-service" : {
"systemProperties" : {
"java.net.preferIPv4Stack": "true",
"SPARK_YARN_MODE": "true"
},
"sparkImpersonationOn": false,
"optimizeLocalization": true,
"mainClass": "com.trifacta.jobserver.SparkJobServer",
"jvmOptions": [
"-Xmx128m"
],
"hiveDependenciesLocation": "%(topOfTree)s/hadoop-deps/cdh-5.4/build/libs",
"env": {
"SPARK_JOB_SERVICE_PORT": "4007",
"SPARK_DIST_CLASSPATH": "",
"MAPR_TICKETFILE_LOCATION": "<MAPR_TICKETFILE_LOCATION>",
"MAPR_IMPERSONATION_ENABLED": "0",
"HADOOP_USER_NAME": "trifacta",
"HADOOP_CONF_DIR": "%(topOfTree)s/conf/hadoop-site/"
},
"enabled": true,
"enableHiveSupport": true,
"enableHistoryServer": false,
"classpath": "%(topOfTree)s/services/spark-job-server/server/build/libs/spark-job-server-bundle.jar:%(topOfTree)s/conf/hadoop-site/:%(topOfTree)s/services/spark-job-server/build/bundle/*:%(topOfTree)s/%(hadoopBundleJar)s",
"autoRestart": false,
},
Property Description sparkImpersonationOn
Set this value to true
, if secure impersonation is enabled on your cluster. See Configure for Secure Impersonation.jvmOptions
hiveDependenciesLocation
env.SPARK_JOB_SERVICE_PORT
Set this value to the listening port number on the cluster for Spark. Default value is 4007
. For more information, see System Ports.env.HADOOP_USER_NAME
trifacta
.env.HADOOP_CONF_DIR
enabled
Set this value to true
to enable the Spark Job Service.enableHiveSupport
Configure service for Hive
Environment spark.enableHiveSupport Hive is not present false
Hive is present but not enabled. false
Hive is present and enabled true
hive-site.xml
file must be copied to the correct directory:cp /etc/hive/conf/hive-site.xml /opt/trifacta/conf/hadoop-site/hive-site.xml
hive-site.xml
file has been installed on the Alteryx node . A valid connection is not required. For more information, see Configure for Hive .Configure Spark
Yarn cluster mode
Configure access for secure impersonation
Folder Name Default Value Description "hdfs.pathsConfig.libraries"
/trifacta/libraries
"hdfs.pathsConfig.tempFiles"
/trifacta/tempfiles
Holds temporary progress information files for YARN applications. Each file contains a number indicating the progress percentage. No sensitive information is written to this location. "hdfs.pathsConfig.dictionaries"
/trifacta/dictionaries
Identify Hadoop libraries on the cluster
spark-job-service
configuration block."spark-job-service.env.HADOOP_CONF_DIR": "<path_to_Hadoop_conf_dir_on_Hadoop_cluster>",
Property Description spark-job-service.env.HADOOP_CONF_DIR Path to the Hadoop configuration directory on the Hadoop cluster. SPARK_DIST_CLASSPATH
property must be set depending on your Hadoop distribution.Locate Hive dependencies location
"spark-job-service.hiveDependenciesLocation":"%(topOfTree)s/hadoop-deps/cdh-5.10/build/libs",
Enable High Availability
Configure for Designer Cloud Powered by Trifacta platform
Set base storage layer
The platform requires that one backend datastore be configured as the base storage layer. This base storage layer is used for storing uploaded data and writing results and profiles.
NOTE: By default, the base storage layer for Designer Cloud Enterprise Edition is set to HDFS. You can change it now, if needed. After this base storage layer is defined, it cannot be changed again.
Verify Operations
NOTE: You can try to verify operations using the Trifacta Photon running environment at this time. While you can also try to run a job on the Hadoop cluster, additional configuration may be required to complete the integration. These steps are listed under Next Steps below.
To complete this test, you should locate or create a simple dataset. Your dataset should be created in the format that you wish to test. Characteristics: If you are testing an integration, you should store your dataset in the datastore with which the product is integrated. Tip: Uploading datasets is always available as a means of importing datasets. Steps: Troubleshooting: At this point, you have read access to your datastore from the platform. If not, please check the logs, permissions, and your Alteryx® configuration. If options are presented, select the defaults. Troubleshooting: Later, you can re-run this job on a different running environment. Some formats are not available across all running environments. Checkpoint: You have verified importing from the selected datastore and transforming a dataset. If your job was successfully executed, you have verified that the product is connected to the job running environment and can write results to the defined output location. Optionally, you may have tested profiling of job results. If all of the above tasks completed, the product is operational end-to-end.Prepare Your Sample Dataset
Store Your Dataset
Verification Steps
Documentation
Tip: You should access online documentation through the product. Online content may receive updates that are not present in PDF content.
You can access complete product documentation online and in PDF format. From within the Designer Cloud application, select Help menu > Product Docs.
Next Steps
After you have accessed the documentation, the following topics are relevant to on-premises deployments. Please review them in order.
NOTE: These materials are located in the Configuration Guide.
Topic | Description |
---|---|
Required Platform Configuration | This section covers the following topics, some of which should already be completed:
|
Configure for Hadoop | |
Enable Integration with Compressed Clusters | If the Hadoop cluster uses compression, additional configuration is required. |
Enable Integration with Cluster High Availability | If you are integrating with high availability on the Hadoop cluster, please complete these steps.
|
Configure for Hive | Integration with the Hadoop cluster's instance of Hive. |
Configure for KMS | Integration with the Hadoop cluster's key management system (KMS) for encrypted transport. Instructions are provided for distribution-specific versions of Hadoop. |
Configure Security | A list of topics on applying additional security measures to the Designer Cloud Powered by Trifacta platform and how integrates with Hadoop. |
Configure SSO for AD-LDAP | Please complete these steps if you are integrating with your enterprise's AD/LDAP Single Sign-On (SSO) system. |
This page has no comments.