Page tree

 

Contents:


 

This integration is currently blocked by the following third-party known issue: Cloudera Navigator OPSAPS-39589.

This functionality was last confirmed to behaving correctly in the following deployment only:

  • Trifacta platform Release 4.0 and earlier
  • Cloudera 5.8.x
  • Cloudera Navigator 2.7.1

For more information, please contact Cloudera Support.

The Trifacta® platform can optionally publish metadata about recipe and jobs to Cloudera Navigator, which provides data governance over the Cloudera cluster. This section describes how to enable and configure this integration.

When this integration is enabled, recipe and job information can be published for all jobs executed through Pig. 

NOTE: This method of publishing works only for jobs executed on Pig. It does not work for jobs executed on the Spark running environment.

Publishing Behavior

When this feature is enabled, the following behaviors are applied to publishing:

  • When a job completes on the Hadoop Pig running environment, the Trifacta platform automatically attempts to publish the link to the corresponding Trifacta job to Navigator.
  • If the attempt is successful, there is no need to execute any additional publishing to Navigator. 
  • If the publishing fails or if you are trying to publish to Navigator a Trifacta job that predates enabling this feature, you can execute publication manually. See Export Results Window.

Pre-requisites

NOTE: The integration does not support publishing of Spark jobs to Navigator. This is a known issue.

 

  1. The Trifacta platform must be installed, configured, and integrated with an existing instance of the Cloudera platform.  Please see the Cloudera Navigator documentation for additional details.
    1. Cloudera 5.8.x supported only.
    2. Cloudera Navigator 2.7.1 or later supported.
  2. The Trifacta node must have the Cloudera Manager port opened. The default port is 7187.
  3. You must have a Navigator user account with write permissions into the appropriate Navigator project.
  4. To enable SSL use:

    NOTE: CDH 5.8 is required for use with SSL with Cloudera Navigator.

    1. A Java keystore and a sample CA certificate must be created on the node hosting Cloudera Manager.
    2. A valid, self-signed certificate must be created on the node hosting Cloudera Manager.
    3. In the order listed, the above certificates must be imported into the Java keystore.
    4. Retain the server path and the passwords for the keystore and certificates.
    5. For more information, see the documentation that was provided for your Cloudera Manager release.

Enable Navigator Publish

Please complete the following steps to enable publication to Cloudera Navigator.

  1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.
  2. Locate the clouderaNavigator properties.

  3. Edit the following properties:

    PropertyDescription
    "clouderaNavigator.baseURL"

    Base URL of the Navigator instance where you are publishing.

    NOTE: The port number must be specified as part of the baseURL . Default value is 7187 .

    "clouderaNavigator.namespace"Namespace in Navigator where metadata is published.
    "clouderaNavigator.enabled"When set to true, publication to Navigator is enabled.
    "clouderaNavigator.username"Username of the Navigator account to use to connect.
    "clouderaNavigator.password"Password of the Navigator account
  4. If you are using HTTPS to connect to Cloudera Navigator, additional configuration is required. Otherwise, set clouderaNavigator.https.enabled to false.
  5. Save the file.

Additional Configuration for SSL

To enable communication over SSL with Cloudera Navigator, please complete the following steps in Cloudera Manager and on the Trifacta node.

NOTE: Before you begin, you must create valid certificates and import them into the Java keystore in the node hosting Cloudera Manager.

Steps:

  1. Launch Cloudera Manager.
  2. Select MGMT.
  3. Click Configuration.
  4. Click Scope > Navigator Metadata Server Category > Security.
  5. Set Enable TLS/SSL for Navigator Metadata Server to true.
  6. Set TLS/SSL for Navigator Metadata Server to the path where the Java keystore was created. The following is the default path:

    /opt/cm_keystore.jks

     

  7. Set TLS/SSL Keystore File Password to the password to the Java keystore.
  8. Set TLS/SSL Keystore Key Password to the password to the certificate.
  9. Restart the MGMT service.
  10. The JKS file that you created must be transferred to an accessible location on the Trifacta node.
  11. On the Trifacta node, edit  trifacta-conf.json
  12. Edit the following properties:

      "clouderaNavigator": {
        "https": {
          "trustStore": {
            "type": "jks",
            "location": "/path/to/jks/file"
          },
          "enabled": false
        },
    PropertyDescription
    typeSet this value to jks.
    location

    Specify the path on the Trifacta node to the JKS file.

    enabledTo enable SSL access, set this value to true.
  13. Change the baseURL value to use HTTPS.
  14. Save the file and restart the platform. See Start and Stop the Platform.

Validate

Steps:

  1. If you haven't done so already, r estart the platform to apply the configuration changes. See Start and Stop the Platform.
  2. Open a dataset in the Transformer Page. 
  3. Click Run Job
  4. Select Run with Hadoop.
  5. When the job completes, click the Export Results in the Job Results window.
  6. In the Export Results window, click Publish under Publish to Cloudera Navigator. See Export Results Window.
  7. A success message is displayed. 
  8. Click the displayed links to verify the results inside Cloudera Navigator. 
    1. You may need to provide a username and password.
    2. There may be a short delay in the results appearing in Cloudera Navigator.

Troubleshooting

Error - Requested data was not found: Pig job info for job '20' not found

When you attempt to publish through the Export Results window and receive this error, you have used a running environment other than Pig. Please re-execute the job using Hadoop Pig.

 

This page has no comments.