This section provides additional configuration requirements for integrating the Trifacta® platform with the Cloudera platform.
NOTE: Except as noted, the following configuration items apply to the latest supported version of Cloudera platform.
Before you begin, it is assumed that you have completed the following tasks:
- Successfully installed a supported version of Cloudera platform into your enterprise infrastructure.
- Installed the Trifacta software in your environment. For more information, see Install Software.
- Reviewed the mechanics of platform configuration. See Required Platform Configuration.
- Configured access to the Trifacta database. See Configure the Databases.
- Performed the basic cluster integration configuration. See Configure for Hadoop.
You have access to platform configuration through the Trifacta node or through the Admin Settings page. You can apply this change through the Admin Settings Page (recommended) or
trifacta-conf.json. For more information, see Platform Configuration Methods.
Configure for Cloudera Data Platform
Cloudera Data Platform (CDP) offers a broad set of cloud services for enterprise data, including data analytics and artificial intelligence. For more information, see https://docs.cloudera.com/cdp/latest/overview/topics/cdp-overview.html.
The configuration required to integrate the Trifacta platform with Cloudera Data Platform (CDP) is very similar to the base Cloudera configuration with the following exceptions:
- You must reference the correct Hadoop dependencies bundle JAR for the platform to use, which should already be specified. For more information, see Configure for Hadoop.
- For Spark:
- You must specify the use of native libraries.
- You must specify a specific Spark version for the platform to use.
These differences in configuration are described inline with the base Cloudera and Spark configuration.
Configure Trifacta platform
Configure Hive Locations
If you are enabling an integration with Hive on the cluster, there are some distribution-specific parameters that must be set. For more information, see Configure for Hive.
Configure SSL for Hive
CDH supports two methods of enabling SSL communications with Hive:
- SASL-QOP method: Enable encryption between Hive JDBC and HiveServer 2 using SASL-QOP. This method is available by default with the Trifacta platform.
- TLS/SSL method: Use TLS/SSL encryption for JDBC connections to HiveServer 2.
To determine the method in use:
- In Cloudera Manager configuration, search for:
- If the options for TLS/SSL are enabled, please complete the following configuration steps.
- If these options are not enabled, the cluster can still use the SASL-QOP method. For more information on this method, see Configure for Hive.
Enable TLS/SSL Method
The default Hive JDBC driver provided with your Trifacta installation must be replaced with the drive provided by Cloudera. Please complete the following commands, noting the wildcards (*) in the JAR path:
NOTE: The current driver must be removed or replaced in the working directory. Do not leave it in the directory.
Enable the Hive connection. For the Hive connection string options, you must specify something like the following:
NOTE: The truststore specified above must exist on the Trifacta node and be accessible to the Trifacta user through the listed password. This truststore must contain the certificate for the Hive server.
Save the parameters file. For more information on creating the connection, see Configure for Hive.
- Restart the platform. See Start and Stop the Platform.
- Verify that you can read from a Hive source through the application. See Hive Connections.
To apply your changes, restart the platform. See Start and Stop the Platform.
This page has no comments.