Skip to main content

Configure for Cloudera

This section provides additional configuration requirements for integrating the Designer Cloud Powered by Trifacta platform with the Cloudera platform.

Note

Except as noted, the following configuration items apply to the latest supported version of Cloudera platform.

Prerequisites

Before you begin, it is assumed that you have completed the following tasks:

  1. Successfully installed a supported version of Cloudera platform into your enterprise infrastructure.

  2. Installed the Alteryx software in your environment. For more information, see Install Software.

  3. Reviewed the mechanics of platform configuration. See Required Platform Configuration.

  4. Configured access to the Alteryx database. See Configure the Databases.

  5. Performed the basic cluster integration configuration. See Configure for Hadoop.

  6. You have access to platform configuration through the Trifacta node or through the Admin Settings page. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

Configure for Cloudera Data Platform

Cloudera Data Platform (CDP) offers a broad set of cloud services for enterprise data, including data analytics and artificial intelligence. For more information, see https://docs.cloudera.com/cdp/latest/overview/topics/cdp-overview.html.

The configuration required to integrate the Designer Cloud Powered by Trifacta platform with Cloudera Data Platform (CDP) is very similar to the base Cloudera configuration with the following exceptions:

  • You must reference the correct Hadoop dependencies bundle JAR for the platform to use, which should already be specified. For more information, see Configure for Hadoop.

  • For Spark:

    • You must specify the use of native libraries.

    • You must specify a specific Spark version for the platform to use.

These differences in configuration are described inline with the base Cloudera and Spark configuration.

Configure Designer Cloud Powered by Trifacta platform

Configure Hive Locations

If you are enabling an integration with Hive on the cluster, there are some distribution-specific parameters that must be set. For more information, see Configure for Hive.

Configure SSL for Hive

CDH supports two methods of enabling SSL communications with Hive:

  1. SASL-QOP method: Enable encryption between Hive JDBC and HiveServer 2 using SASL-QOP. This method is available by default with the Designer Cloud Powered by Trifacta platform.

  2. TLS/SSL method: Use TLS/SSL encryption for JDBC connections to HiveServer 2.

To determine the method in use:

  1. In Cloudera Manager configuration, search for: tls.

  2. If the options for TLS/SSL are enabled, please complete the following configuration steps.

  3. If these options are not enabled, the cluster can still use the SASL-QOP method. For more information on this method, seeConfigure for Hive.

Enable TLS/SSL Method

Steps:

  1. The default Hive JDBC driver provided with your Alteryx installation must be replaced with the drive provided by Cloudera. Please complete the following commands, noting the wildcards (*) in the JAR path:

    Note

    The current driver must be removed or replaced in the working directory. Do not leave it in the directory.

    cd /opt/trifacta/services/data-service/build/dependencies
    rm *hive*jdbc*
    cp /opt/cloudera/parcels/CDH-6.2*/jars/hive-jdbc-1.1.0-cdh6.2.0*jar .
  2. Enable the Hive connection. For the Hive connection string options, you must specify something like the following:

        "connectStrOpts": ";ssl=true;sslTrustStore=</path/to/truststore>;trustStorePassword=<storePassword>"

    Note

    The truststore specified above must exist on the Trifacta node and be accessible to the Alteryx user through the listed password. This truststore must contain the certificate for the Hive server.

  3. Save the parameters file. For more information on creating the connection, see Configure for Hive.

  4. Restart the platform. See Start and Stop the Platform.

  5. Verify that you can read from a Hive source through the application. See Hive Connections.

Restart

To apply your changes, restart the platform. See Start and Stop the Platform.