Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

Azure Storage LayerDescriptionRequired for

D s item
itemBase Storage Layer

Azure StorageAzure storage leverages WASB, an astraction layer on top of HDFS. 

wasbs or wasb

info

NOTE: WASBS is recommended.

Data Lake Store

Data Lake Store maps to ADLS in the

D s platform
. This storage is an implementation of Hortonworks Data Platform and utilizes HDFS.

  • Azure AD SSO
  • Domain-joined clusters
    • Kerberos
    • Secure impersonation
hdfs

...

  1. D s config

  2. Configure Batch Job Runner:

    Code Block
      "batch-job-runner": {
       "autoRestart": true,
        ...
        "classpath": "%(topOfTree)s/hadoop-data/build/install/hadoop-data/hadoop-data.jar:%(topOfTree)s/hadoop-data/build/install/hadoop-data/lib/*:%(topOfTree)s/conf/hadoop-site:/usr/hdp/current/hadoop-client/hadoop-azure.jar:/usr/hdp/current/hadoop-client/lib/azure-storage-2.2.0.jar"
      },

    Configure Diagnostic Server: 

    Code Block
       "diagnostic-serverjob-runner": {
       "autoRestart": true,
        ...
        "classpath": "%(topOfTree)s/apps/diagnostichadoop-serverdata/build/libs/diagnostic-serverinstall/hadoop-data/hadoop-data.jar:%(topOfTree)s/apps/diagnostichadoop-serverdata/build/dependenciesinstall/hadoop-data/lib/*:%(topOfTree)s/conf/hadoop-site:/usr/hdp/current/hadoop-client/hadoop-azure.jar:/usr/hdp/current/hadoop-client/lib/azure-storage-2.2.0.jar"
      },
  3. Configure the following environment variables:

    Code Block
    "env.PATH": "${HOME}/bin:$PATH:/usr/local/bin:/usr/lib/zookeeper/bin",
    "env.TRIFACTA_CONF": "/opt/trifacta/conf"
    "env.JAVA_HOME": "/usr/lib/jvm/java-1.8.0-openjdk-amd64",
  4. Configure the following properties for various

    D s item
    components
    components
    :

    Code Block
      "ml-service": {
       "autoRestart": true
      },
      "monitor": {
       "autoRestart": true,
        ...
       "port": <your_cluster_monitor_port>
      },
      "proxy": {
       "autoRestart": true
      },
      "udf-service": {
       "autoRestart": true
      },
      "webapp": {
        "autoRestart": true
      },
  5. Disable S3 access:

    Code Block
    "aws.s3.enabled": false,
  6. Configure the following Spark Job Service properties:

    Code Block
    "spark-job-service.classpath": "%(topOfTree)s/services/spark-job-server/server/build/libs/spark-job-server-bundle.jar:%(topOfTree)s/conf/hadoop-site/:%(topOfTree)s/services/spark-job-server/build/bundle/*:/usr/hdp/current/hadoop-client/hadoop-azure.jar:/usr/hdp/current/hadoop-client/lib/azure-storage-2.2.0.jar",
    "spark-job-service.env.SPARK_DIST_CLASSPATH": "/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-mapreduce-client/*",
  7. Save your changes.

...

Hive integration requires additional configuration.

Info

NOTE: Natively, HDI supports high availability for Hive via a Zookeeper quorum.

For more information, see see Configure for Hive.

Configure for Spark Profiling

...