Page tree

Release 6.4.2


Contents:

   

Contents:


The Designer Cloud Powered by Trifacta® platform can be configured to publish metadata about recipe and jobs to Cloudera Navigator, which provides data governance over the Cloudera cluster. This section describes how to enable and configure this integration.

Publishing Behavior

When this integration is enabled, recipe and job information is automatically published for all jobs executed on Photon or Spark.  The following behaviors are applied to publishing:

  • When a job completes, the Designer Cloud Powered by Trifacta platform automatically attempts to publish a link to the job to Navigator.
  • Job results are submitted to a queue for Cloudera Navigator to execute. The publishing time may take a while to complete.
  • If the publication is successful, there is no need to execute any additional publishing to Navigator. 

    NOTE: If the publication fails, you must re-run the job in the Designer Cloud Powered by Trifacta platform. Ad-hoc publishing to Navigator of completed jobs is not supported.

  • Success or failure of the publication to Cloudera Navigator can be found in the job.log file for the Alteryx job.

Supported Versions

ComponentSupported Version(s)
Cloudera5.16
Cloudera Navigator2.15.1
Navigator API

9 or later

(13 is recommended)

Limitations

  • The Designer Cloud Powered by Trifacta platform produces its own entities to reference S3 objects. In Navigator, this feature is undocumented, and ID generation for S3 endpoint proxies is broken in the current release of the Navigator SDK (https://github.com/cloudera/navigator-sdk/issues/91).
  • Jobs that read or write from a Hive table using the Hive-remote JDBC connector cannot link with Navigator's entity for the Hive table. Instead, Navigator links to a JDBC table entity created by the Designer Cloud Powered by Trifacta platform.

Jobs published to the following targets cannot also be published to Cloudera Navigator:

  • Tableau Server

NOTE: In Cloudera Navigator, platform jobs display only a single source of data, even if the job references multiple data sources.


Pre-requisites

  1. The Designer Cloud Powered by Trifacta platform must be installed, configured, and integrated with an existing instance of the Cloudera platform.  Please see the Cloudera Navigator documentation for additional details.
  2. The Alteryx node must have the Cloudera Manager port opened. The default port is 7187.
  3. You must have a Navigator user account with write permissions into the appropriate Navigator project.
  4. To enable SSL use:

    1. A Java keystore and a sample CA certificate must be created on the node hosting Cloudera Manager.
    2. A valid, self-signed certificate must be created on the node hosting Cloudera Manager.
    3. In the order listed, the above certificates must be imported into the Java keystore.
    4. Retain the server path and the passwords for the keystore and certificates.
    5. For more information, see the documentation that was provided for your Cloudera Manager release.

Enable Navigator Publish

Please complete the following steps to enable publication to Cloudera Navigator.

  1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.
  2. Locate the clouderaNavigator properties.

  3. Edit the following properties:

    PropertyDescription
    "clouderaNavigator.enabled"When set to true, publication to Navigator is enabled.
    "clouderaNavigator.baseURL"

    Base URL of the Navigator instance where you are publishing.

    NOTE: The port number must be specified as part of the baseURL. Default value is 7187.

    "clouderaNavigator.username"Username of the Navigator account to use to connect.
    "clouderaNavigator.password"Password of the Navigator account
    "clouderaNavigator.namespace"Namespace in Navigator where metadata is published.
    "clouderaNavigator.apiVersion"

    The version of the Cloudera Navigator API to use.

    Tip: This API version appears as part of the base URL. It should not be modified unless directed to do so.

  4. If you are using HTTPS to connect to Cloudera Navigator, additional configuration is required. See below.
    1. Otherwise, set clouderaNavigator.https.enabled to false.
  5. Save your changes.

Additional Configuration for SSL

To enable communication over SSL with Cloudera Navigator, please complete the following steps in Cloudera Manager and on the Alteryx node.

NOTE: Before you begin, you must create valid certificates and import them into the Java keystore in the node hosting Cloudera Manager.

Steps:

  1. Launch Cloudera Manager.
  2. Select MGMT.
  3. Click Configuration.
  4. Click Scope > Navigator Metadata Server Category > Security.
  5. Set Enable TLS/SSL for Navigator Metadata Server to true.
  6. Set TLS/SSL for Navigator Metadata Server to the path where the Java keystore was created. The following is the default path:

    /opt/cm_keystore.jks

     

  7. Set TLS/SSL Keystore File Password to the password to the Java keystore.
  8. Set TLS/SSL Keystore Key Password to the password to the certificate.
  9. Restart the MGMT service.
  10. The JKS file that you created must be transferred to an accessible location on the Alteryx node.
  11. Login to the application. 
  12. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.
  13. Configure the following properties:

     

    PropertyDescription
    "clouderaNavigator.https.enabled"Set this value to true to enable HTTPS communications with Navigator.
    "clouderaNavigator.https.trustStore.type"The format of the Java keystore file. Set this value to jks.
    "clouderaNavigator.https.trustStore.location"

    The absolute path on the Alteryx node to the Java keystore file. In the previous example, this value was the following: /opt/cm_keystore.jks

    "clouderaNavigator.https.trustStore.password"
    The password to the Java keystore file
  14. Change the clouderaNavigator.baseURL value to use HTTPS.
  15. Save your changes and restart the platform. See Start and Stop the Platform.

Configure Custom Source Types

In Cloudera Navigator, every listed entity is associated with a source. A source is a specified resource that is part of each job listing. Example sources include HDFS namenodes and Hive metadata servers. Each source has a specified source type. For more information on source types, see https://github.com/cloudera/navigator-sdk/blob/master/model/src/main/java/com/cloudera/nav/sdk/model/SourceType.java.

For each source type:

  • If you have only one source for a type, no further configuration is required. 
  • If you have multiple source types, you must specify your custom sources. For example, if you have multiple HDFS clusters, you must specify them in your custom sources. See below.

    NOTE: If you have multiple sources for a single type and have not completed the following configuration, the Designer Cloud Powered by Trifacta platform publication job to Cloudera Navigator fails. In the job log, you can review the source and the source type identifiers that caused the failure.

Steps:

  1. To list all of your Cloudera Navigator source types, visit the following URL:

    http://<navigator_instance_url>:<port_number>/api/v<apiVersion>/entities?query=type%3Asource

    where:

    PropertyDescription
    <navigator_instance_url>The URL of your instance of Cloudera Navigator
    <port_number>The port number of your instance of Cloudera Navigator. Default value is 7187.
    <apiVersion>The version number of the API in use.
  2. Example: cURL with jq:

    curl "http://username:password@cloudera-navigator-host:7187/api/v13/entities?query=type%3Asource" | jq '[.[] | {type: .sourceType, identity: .identity}]'
  3. From the returned list, you can determine the source Id's to use for each source type. 
  4. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.
  5. Locate the following settings. 

    NOTE: Leave these settings empty if you intend to use the default settings from Navigator.


    SettingDescription
    clouderaNavigator.customsources.YARNNavigator identifier for the YARN resource manager to use
    clouderaNavigator.customsources.SPARKNavigator identifier for the Spark instance to use
    clouderaNavigator.customsources.S3Navigator identifier for the S3 bucket to use
    clouderaNavigator.customsources.HIVENavigator identifier for the Hive metadata server to use
    clouderaNavigator.customsources.HDFSNavigator identifier for the HDFS cluster to use
  6. You can specify the cluster to use by listing source type and mapping pairs under customSources. If you have multiple sources of a single type, this step disambiguates between them for the Designer Cloud Powered by Trifacta platform.

    1. First you must acquire the identifiers to use from Cloudera Navigator via API. 
      1. Navigator endpoint: /v13/entities?query=type%3Asource
      2. Request method: GET
      3. Example response:

        [
          {
            "type": "S3",
            "identity": "9"
          },
          {
            "type": "YARN",
            "identity": "6"
          },
          {
            "type": "SPARK",
            "identity": "5"
          },
          {
            "type": "CLUSTER",
            "identity": "1"
          },
          {
            "type": "HDFS",
            "identity": "10"
          },
          {
            "type": "HDFS",
            "identity": "11"
          },
          {
            "type": "HDFS",
            "identity": "12"
          }
        ]
    2. In the above example response are three HDFS clusters. Below, one of these clusters havs been specified by Id (11) to be used. 

      {
          ...,
          "clouderaNavigator": {
            ...,
            "customSources": {
              "HDFS": "11"
            }
          },
          ...
      }
  7. Save your changes and restart the platform.
  8. Verify your mappings by running a job on the named source. See below.

Validate

Steps:

  1. If you haven't done so already, restart the platform to apply the configuration changes. See Start and Stop the Platform.
  2. Run a job.
  3. When the job completes, open the job through the Jobs page. See Jobs Page.
  4. Acquire the jobGroup Id for the job. It is the final value in the URL. In the following example, the jobGroup Id is 3:

    http://example.com:3005/jobs/3
  5. Login to Navigator. Search for the following string:

    trifacta.<jobGroupId>

    NOTE: It may take up to 30 minutes for results to be published to Navigator.

  6. When you see one or more entries, such as the following, the job has been successfully published:

    trifacta.14.wrangle.29
    trifacta.14.filewriter.30
    trifacta.14.filewriter.31
  7. The above entries indicate the individual jobs within the job group that have been completed.

This page has no comments.