Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

D toc

This section describes how to enable the

D s platform
rtrue
 to use the HttpFS service for communicating with Hadoop HDFS. HttpFS is commonly used in the following scenarios:

  1. High Availability. WebHDFS does not support High Availability failover. You must use HttpFS instead. 

  2. HDFS user is not available for secure impersonation. If you have enabled secure impersonation in an environment where the HDFS superuser is restricted from use, you can enable HttpFS and use the HttpFS superuser for secure impersonation. 

     

Excerpt

Pre-requisites

Before you begin, please verify that you have done the following in your environment:

  • Enabled HDFS in your Hadoop cluster.
  • Installed hadoop-httpfs into your Hadoop cluster.
  • HttpFS has been enabled on a known port on the cluster. 

    Info

    NOTE: If you are enabling HttpFS for use with High Availability, you should avoid enabling the HttpFS service on the primary namenode of the cluster. For more information, see Enable Integration with Cluster High Availability.

    Info

    NOTE: By default, HttpFS is available on port 14000. Please verify the port number in use for your cluster.

  • Started HttpFS service on the cluster. 

Configuration

D s config

Steps:

  1. The configuration settings for HttpFS are within the HDFS configuration area:

    Code Block
    "hdfs.webhdfs.host": "",
    "hdfs.webhdfs.port": 14000,
    "hdfs.webhdfs.httpfs": true,
  2. Set hdfs.webhdfs.httpfs to true.
  3. Specify the host and port for the HttpFS service. You can use one of the following methods:

    1. Specify hdfs.webhdfs.host and hdfs.webhdfs.port values to point to the node hosting HttpFS.
    2. Leave the hdfs.webhdfs.host value empty, in which case the platform falls back to using the namenode host as the WebHDFS host. Modify that value if required.

      Info

      NOTE: By default, the platform expects this service to be available on port 14000. Please apply the value that matches your cluster environment.

  4. Save your changes and restart the platform.

Enable SSL

Optionally, you can enable secure (SSL) communications between the platform and HttpFS.

Info

NOTE: The most secure method requires the creation and deployment of an SSL certificate for the HDFS instance. These steps provide instructions for how to do so.

If this certificate is not available, you can still enable communication over SSL over WebHDFS or HttpFS. Please skip steps 1 and 2 and complete the secure configuration without certificate export.

 

Steps:

  1. Deploy a PEM file certificate that can be read by the

    D s defaultuser
    Typeos.user
    Fulltrue
    user account on the
    D s item
    itemnode
    .

    Info

    NOTE: The following security configuration requires export of and access to an SSL certificate in PEM file format for the HDFS instance. Creation and deployment of this certificate exceeds the scope of this document. Please see the documentation provided with your Hadoop distribution.


    Certificates are commonly stored in Java keystores. They can be exported to PEM file format using the following command:

    Code Block
    keytool -exportcert -rfc  -alias <node_alias>   -storepass <pwd> -keystore cacerts -file <filename.pem>

    where:
    <pwd> is the keystore password.
    <filename.pem> is the output filename for the certificate. 
    <node_alias> is the alias for the certificate in the keystore. 

  2. Place this generated certificate on the

    D s item
    itemnode
    in a place where it is readable by the
    D s defaultuser
    Typeos.user
    Fulltrue
    user. The following location is suitable:

    Code Block
    /opt/trifacta
  3. D s config
  4. Locate the following setting and enable it:

    SettingDescription
    "hdfs.webhdfs.ssl.enabled": true, Set to true to enable SSL communications with WebHDFS or (if enabled) HttpFS.
  5. There is no need to update the port number. Port 14000 applies to HTTP and HTTPS.

  6. Security Level: The level of security is determined by the following configuration options:

    1. Secure without certificate export:

      SettingDescription
      "hdfs.webhdfs.ssl.certificateValidationRequired": false,Set to false to disable use of trusted certificate validation.
      "hdfs.webhdfs.ssl.certificatePath: "",

      Leave this value empty.

    2. Secure with certificate:

      SettingDescription
      "hdfs.webhdfs.ssl.certificateValidationRequired": false, Set to true to require SSL use of trusted certificate validation.
      "hdfs.webhdfs.ssl.certificatePath: "",

      Configure the path on the

      D s item
      itemnode
      to the location where you stored the certificate.

  7. Save your changes and restart the platform.