Page tree

 

Contents:


The Trifacta® platform can be configured to publish metadata about recipe and jobs to Cloudera Navigator, which provides data governance over the Cloudera cluster. This section describes how to enable and configure this integration.

NOTE: This integration is supported in Release 6.0.1 and later.

Publishing Behavior

When this integration is enabled, recipe and job information is automatically published for all jobs executed on Photon or Spark.  The following behaviors are applied to publishing:

  • When a job completes, the Trifacta platform automatically attempts to publish a link to the job to Navigator.
  • Job results are submitted to a queue for Cloudera Navigator to execute. The publishing time may take a while to complete.
  • If the publication is successful, there is no need to execute any additional publishing to Navigator. 

    NOTE: If the publication fails, you must re-run the job in the Trifacta platform. Ad-hoc publishing to Navigator of completed jobs is not supported.

  • Success or failure of the publication to Cloudera Navigator can be found in the job.log file for the Trifacta job.

Supported Versions

ComponentSupported Version(s)
Cloudera5.16
Cloudera Navigator2.15.1
Navigator API

9 or later

(13 is recommended)

Limitations

Jobs published to the following targets cannot also be published to Cloudera Navigator:

  • Tableau Server
  • S3
  • Redshift

The following types of Trifacta sub-jobs are not published to Cloudera Navigator. These jobs do not appear in the Navigator console:

  • Profiling

NOTE: In Cloudera Navigator, platform jobs display only a single source of data, even if the job references multiple data sources.


Pre-requisites

  1. The Trifacta platform must be installed, configured, and integrated with an existing instance of the Cloudera platform.  Please see the Cloudera Navigator documentation for additional details.
  2. The Trifacta node must have the Cloudera Manager port opened. The default port is 7187.
  3. You must have a Navigator user account with write permissions into the appropriate Navigator project.
  4. To enable SSL use:

    1. A Java keystore and a sample CA certificate must be created on the node hosting Cloudera Manager.
    2. A valid, self-signed certificate must be created on the node hosting Cloudera Manager.
    3. In the order listed, the above certificates must be imported into the Java keystore.
    4. Retain the server path and the passwords for the keystore and certificates.
    5. For more information, see the documentation that was provided for your Cloudera Manager release.

Enable Navigator Publish

Please complete the following steps to enable publication to Cloudera Navigator.

  1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.
  2. Locate the clouderaNavigator properties.

  3. Edit the following properties:

    PropertyDescription
    "clouderaNavigator.enabled"When set to true, publication to Navigator is enabled.
    "clouderaNavigator.baseURL"

    Base URL of the Navigator instance where you are publishing.

    NOTE: The port number must be specified as part of the baseURL. Default value is 7187.

    "clouderaNavigator.username"Username of the Navigator account to use to connect.
    "clouderaNavigator.password"Password of the Navigator account
    "clouderaNavigator.namespace"Namespace in Navigator where metadata is published.
    "clouderaNavigator.apiVersion"

    The version of the Cloudera Navigator API to use.

    Tip: This API version appears as part of the base URL. It should not be modified unless directed to do so.

  4. If you are using HTTPS to connect to Cloudera Navigator, additional configuration is required. Otherwise, set clouderaNavigator.https.enabled to false.
  5. Save your changes.

Additional Configuration for SSL

To enable communication over SSL with Cloudera Navigator, please complete the following steps in Cloudera Manager and on the Trifacta node.

NOTE: Before you begin, you must create valid certificates and import them into the Java keystore in the node hosting Cloudera Manager.

Steps:

  1. Launch Cloudera Manager.
  2. Select MGMT.
  3. Click Configuration.
  4. Click Scope > Navigator Metadata Server Category > Security.
  5. Set Enable TLS/SSL for Navigator Metadata Server to true.
  6. Set TLS/SSL for Navigator Metadata Server to the path where the Java keystore was created. The following is the default path:

    /opt/cm_keystore.jks

     

  7. Set TLS/SSL Keystore File Password to the password to the Java keystore.
  8. Set TLS/SSL Keystore Key Password to the password to the certificate.
  9. Restart the MGMT service.
  10. The JKS file that you created must be transferred to an accessible location on the Trifacta node.
  11. Login to the application. 
  12. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.
  13. Configure the following properties:

     

    PropertyDescription
    "clouderaNavigator.https.enabled"Set this value to true to enable HTTPS communications with Navigator.
    "clouderaNavigator.https.trustStore.type"The format of the Java keystore file. Set this value to jks.
    "clouderaNavigator.https.trustStore.location"

    The absolute path on the Trifacta node to the Java keystore file. In the previous example, this value was the following: /opt/cm_keystore.jks

    "clouderaNavigator.https.trustStore.password"
    The password to the Java keystore file
  14. Change the clouderaNavigator.baseURL value to use HTTPS.
  15. Save your changes and restart the platform. See Start and Stop the Platform.

Configure Custom Source Types

In Cloudera Navigator, every listed entity is associated with a source. A source is a specified resource that is part of each job listing. Example sources include HDFS namenodes and Hive metadata servers. Each source has a specified source type. For more information on source types, see https://github.com/cloudera/navigator-sdk/blob/master/model/src/main/java/com/cloudera/nav/sdk/model/SourceType.java.

For each source type:

  • If you have only one source for a type, no further configuration is required. 
  • If you have multiple source types, you must specify your custom sources. For example, if you have multiple HDFS clusters, you must specify them in your custom sources. See below.

    NOTE: If you have multiple sources for a single type and have not completed the following configuration, the Trifacta platform publication job to Cloudera Navigator fails. In the job log, you can review the source and the source type identifiers that caused the failure.

Steps:

  1. To list all of your Cloudera Navigator source types, visit the following URL:

    http://<navigator_instance_url>:<port_number>/api/v<apiVersion>/entities?query=type%3Asource

    where:

    PropertyDescription
    <navigator_instance_url>The URL of your instance of Cloudera Navigator
    <port_number>The port number of your instance of Cloudera Navigator. Default value is 7187.
    <apiVersion>The version number of the API in use.
  2. Example: cURL with jq:

    curl "http://username:password@cloudera-navigator-host:7187/api/v13/entities?query=type%3Asource" | jq '[.[] | {type: .sourceType, identity: .identity}]'
  3. From the returned list, you can determine the source Id's to use for each source type. 
  4. To apply this configuration change, login as an administrator to the Trifacta node. Then, edit trifacta-conf.json. Some of these settings may not be available through the Admin Settings Page. For more information, see Platform Configuration Methods.
  5. Locate the following area. Suppose you have two HDFS clusters, named hdfsCl01 and hdfsCl02 in Cloudera Navigator. You can specify the cluster to use by listing source type and mapping pairs under customSources. Below, the HDFS cluster to specify has been set to the second one:

    {
        ...,
        "clouderaNavigator": {
          ...,
          "customSources": {
            "HDFS": "hdfsCl02"
          }
        },
        ...
    }
  6. Save the file and restart the platform.
  7. Verify your mappings by running a job on the named source. See below.

Validate

Steps:

  1. If you haven't done so already, restart the platform to apply the configuration changes. See Start and Stop the Platform.
  2. Run a job.
  3. When the job completes, open the job through the Jobs page. See Jobs Page.
  4. Acquire the jobGroup Id for the job. It is the final value in the URL. In the following example, the jobGroup Id is 3:

    http://example.com:3005/jobs/3
  5. Login to Navigator. Search for the following string:

    trifacta.<jobGroupId>

    NOTE: It may take up to 30 minutes for results to be published to Navigator.

  6. When you see one or more entries, such as the following, the job has been successfully published:

    trifacta.14.wrangle.29
    trifacta.14.filewriter.30
    trifacta.14.filewriter.31
  7. The above entries indicate the individual jobs within the job group that have been completed.

This page has no comments.