Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • This release supports integration with HDI 3.5 and 3.6 only.
  • D s itemnodenode must be installed on Ubuntu 14.04HDI does not support the client-server web sockets configuration used by the platform. This limitation has the following impacts:

    • Diminished suggestions prompted by platform activities

      User-defined functions (UDFs) are not supported. 

      InfoNOTE: Some error messages may appear in the browser related to these services. These errors are harmless.

Pre-requisites

This section makes the following assumptions:

...

Azure Storage LayerDescriptionRequired for

D s item
itemBase Storage Layer

Azure StorageAzure storage leverages WASB, an astraction layer on top of HDFS. 

wasbs or wasb

Info

NOTE: WASBS is recommended.

wasbs

Data Lake Store

Data Lake Store maps to ADLS in the

D s platform
. This storage is an implementation of Hortonworks Data Platform and utilizes HDFS.

  • Azure AD SSO
  • Domain-joined clusters
    • Kerberos
    • Secure impersonation
hdfs

Define Script Action for domain-joined clusters

If you are integrating with a domain-joined cluster, you must specify a script action to set some permissions on cluster directories. 

...

hdfs

Specify Protocol

In the Ambari console, you must specify the communication protocol to use in the cluster. 

Info

NOTE: The cluster protocol must match the protocol in use by the

D s platform
.

Steps:

  1. In the Advanced Settings tab of the cluster specification, click Script actions.
  2. In the textbox, insert the following URL:

    Code Block
    https://raw.githubusercontent.com/trifacta/azure-deploy/master/bin/set-key-permissions.sh
  3. Save your changes.

Install Software

If you haven't done so already, you can install the 

D s item
itemsoftware
 into an edge node of the HDI cluster. For more information, see Installation Steps.

Configure the Platform

These changes must be applied after the  

D s platform
has been installed.

D s config

Specify 
D s item
itemuser

Set the Hadoop username for the 

D s platform
 to use for executing jobs
D s defaultuser
Typehadoop
Fulltrue
:  

Code Block
"hdfs.username": "[hadoop.user]",

Specify location of client distribution bundle JAR

The 

D s platform
 ships with client bundles supporting a number of major Hadoop distributions.  You must configure the jarfile for the distribution to use.  These distributions are stored in the following directory:

/trifacta/hadoop-deps

Configure the bundle distribution property (hadoopBundleJar):

Code Block
  "hadoopBundleJar": "hadoop-deps/hdp-2.6/build/libs/hdp-2.6-bundle.jar"

Configure component settings

For each of the following components, please explicitly set the following settings.

  1. D s config

  2. Configure Batch Job Runner:

    Code Block
      "batch-job-runner": {
       "autoRestart": true,
        ...
        "classpath": "%(topOfTree)s/hadoop-data/build/install/hadoop-data/hadoop-data.jar:%(topOfTree)s/hadoop-data/build/install/hadoop-data/lib/*:%(topOfTree)s/conf/hadoop-site:/usr/hdp/current/hadoop-client/hadoop-azure.jar:/usr/hdp/current/hadoop-client/lib/azure-storage-2.2.0.jar"
      },
  3. Configure Diagnostic Server: 

    Code Block  "diagnostic-serverAmbari console, please migrate to the following location: HDFS > Configs > Advanced > Advanced Core Site > fs.defaultFS.
  4. Set the value according to the following table:

    Azure Storage LayerProtocol (fs.defaultFS) value

    D s platform
    config value

    Link
    Azure Storagewasbs://<containername>@<accountname>.blob.core.windows.net
    "webapp.storageProtocol""wasbs",See Set Base Storage Layer.
    Data Lake Storeadl://home
    "webapp.storageProtocol""hdfs",See Set Base Storage Layer.


  5. Save your changes.

Define Script Action for domain-joined clusters

If you are integrating with a domain-joined cluster, you must specify a script action to set some permissions on cluster directories. 

For more information, see https://docs.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-configure-using-azure-adds.

Steps:

  1. In the Advanced Settings tab of the cluster specification, click Script actions.
  2. In the textbox, insert the following URL:

    Code Block
    https://raw.githubusercontent.com/trifacta/azure-deploy/master/bin/set-key-permissions.sh


  3. Save your changes.

Install Software

If you haven't done so already, you can install the 

D s item
itemsoftware
 into an edge node of the HDI cluster. For more information, see Installation Steps.

Configure the Platform

These changes must be applied after the  

D s platform
has been installed.

D s config

Specify 
D s item
itemuser

Set the Hadoop username for the 

D s platform
 to use for executing jobs
D s defaultuser
Typehadoop
Fulltrue
:  

Code Block
"hdfs.username": "[hadoop.user]",

Specify location of client distribution bundle JAR

The 

D s platform
 ships with client bundles supporting a number of major Hadoop distributions.  You must configure the jarfile for the distribution to use.  These distributions are stored in the following directory:

/trifacta/hadoop-deps

Configure the bundle distribution property (hadoopBundleJar):

Code Block
  "hadoopBundleJar": "hadoop-deps/hdp-2.6/build/libs/hdp-2.6-bundle.jar"

Configure component settings

For each of the following components, please explicitly set the following settings.

  1. D s config

  2. Configure Batch Job Runner:

    Code Block
      "batch-job-runner": {
       "autoRestart": true,
        ...
        "classpath": "%(topOfTree)s/apps/diagnostichadoop-serverdata/build/install/libshadoop-data/diagnostichadoop-serverdata.jar:%(topOfTree)s/apps/diagnostic-server/build/dependencies/hadoop-data/build/install/hadoop-data/lib/*:%(topOfTree)s/conf/hadoop-site:/usr/hdp/current/hadoop-client/hadoop-azure.jar:/usr/hdp/current/hadoop-client/lib/azure-storage-2.2.0.jar"
      },


  3. Configure the following environment variables:

    Code Block
    "env.PATH": "${HOME}/bin:$PATH:/usr/local/bin:/usr/lib/zookeeper/bin",
    "env.TRIFACTA_CONF": "/opt/trifacta/conf"
    "env.JAVA_HOME": "/usr/lib/jvm/java-1.8.0-openjdk-amd64",


  4. Configure the following properties for various

    D s item
    components
    components
    :

    Code Block
      "ml-service": {
       "autoRestart": true
      },
      "monitor": {
       "autoRestart": true,
        ...
       "port": <your_cluster_monitor_port>
      },
      "proxy": {
       "autoRestart": true
      },
      "udf-service": {
       "autoRestart": true
      },
      "webapp": {
        "autoRestart": true
      },


  5. Disable S3 access:

    Code Block
    "aws.s3.enabled": false,


  6. Configure the following Spark Job Service properties:

    Code Block
    "spark-job-service.classpath": "%(topOfTree)s/services/spark-job-server/server/build/libs/spark-job-server-bundle.jar:%(topOfTree)s/conf/hadoop-site/:%(topOfTree)s/services/spark-job-server/build/bundle/*:/usr/hdp/current/hadoop-client/hadoop-azure.jar:/usr/hdp/current/hadoop-client/lib/azure-storage-2.2.0.jar",
    "spark-job-service.env.SPARK_DIST_CLASSPATH": "/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-mapreduce-client/*",


  7. Save your changes.

...

Hive integration requires additional configuration.

Info

NOTE: Natively, HDI supports high availability for Hive via a Zookeeper quorum.

For more information, see see Configure for Hive.

Configure for Spark Profiling

If you are using Spark for profiling, you must add environment properties to your cluster configuration. See Configure for Spark.

Configure for UDFs

If you are using user-defined functions (UDFs) on your HDInsight cluster, additional configuration is required. See Java UDFs.

Configure Storage

Before you begin running jobs, you must specify your base storage layer, which can be WASB or ADLS. For more information, see Set Base Storage Layer.

...