Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

D s shared
pageDeploy

Excerpt

After you have performed the base installation of the 

D s platform
rtrue
, please complete the following steps

...

if you are integrating with a Hadoop cluster.

Apply cluster configuration files - non-edge node

If the 

D s platform
 is being installed on a non-edge node, you must copy over the Hadoop Client Configuration files from the cluster. 

Info

NOTE: When these files change, you must update the local copies. For this reason, it is best to install on an edge node.

  1. Download the Hadoop Client Configuration files from the Hadoop cluster. The required files are the following:
    1. core-site.xml
    2. hdfs-site.xml
    3. mapred-site.xml
    4. yarn-site.xml
    5. hive-site.xml (if you are using Hive)
  2. These configuration files must be moved to the 

    D s item
    itemdeployment
    . By default, these files are in /etc/hadoop/conf:

    Code Block
    sudo cp <location>/*.xml /opt/trifacta/conf/hadoop-site/
    sudo chown trifacta:trifacta /opt/trifacta/conf/hadoop-site/*.xml 
    

For more information, see Configure for Hadoop.

Apply cluster configuration files - edge node

If the 

D s platform
 is being installed on an edge node of the cluster, you can create a symlink from a local directory to the source cluster files so that they are automatically updated as needed.

  1. Navigate to the following directory on the 

    d-s-

...

  1. node
    :

    Code Block
    cd /opt/trifacta/conf/hadoop-site
  2. Create a symlink for each of the Hadoop Client Configuration files referenced in the previous steps. Example:

    Code Block
    ln -s /etc/hadoop/conf/core-site.xml core-site.xml
  3. Repeat the above steps for each of the Hadoop Client Configuration files.

For more information, see Configure for Hadoop.

Modify 
D s item
configuration
configuration
 changes

  1. D s config
    methodt

  2. HDFS: Change the host and port information for HDFS as needed. Please apply the port numbers for your distribution:

    Code Block
    "hdfs.namenode.host": "<namenode>",
    "hdfs.namenode.port": <hdfs_port_num>
    "hdfs.yarn.resourcemanager": {
    "hdfs.yarn.webappPort": 8088,
    "hdfs.yarn.adminPort": 8033,
    "hdfs.yarn.host": "<resourcemanager_host>",
    "hdfs.yarn.port": <resourcemanager_port>,
    "hdfs.yarn.schedulerPort": 8030

    For more information, see Configure for Hadoop.
     

  3. Save your changes and restart the platform.

Excerpt Include
Configure for Spark
Configure for Spark
nopaneltrue

For more information, see Configure for Spark.

Enable High Availability

Info

NOTE: If high availability is enabled on the Hadoop cluster, it must be enabled on the

D s platform
, even if you are not planning to rely on it. See Enable Integration with Cluster High Availability.