...
- This release supports integration with HDI 3.5 and 3.6 only.
D s item node node must be installed on Ubuntu 14.04HDI does not support the client-server web sockets configuration used by the platform. This limitation has the following impacts:Diminished suggestions prompted by platform activities
User-defined functions (UDFs) are not supported.
NOTE: Some error messages may appear in the browser related to these services. These errors are harmless.Info
Pre-requisites
This section makes the following assumptions:
...
Azure Storage Layer | Description | Required for |
| ||||
---|---|---|---|---|---|---|---|
Azure Storage | Azure storage leverages WASB, an astraction layer on top of HDFS. |
|
| ||||
Data Lake Store | Data Lake Store maps to ADLS in the
|
| hdfs |
Define Script Action for domain-joined clusters
If you are integrating with a domain-joined cluster, you must specify a script action to set some permissions on cluster directories.
...
hdfs |
Specify Protocol
In the Ambari console, you must specify the communication protocol to use in the cluster.
Info | |
---|---|
NOTE: The cluster protocol must match the protocol in use by the
|
Steps:
- In the Advanced Settings tab of the cluster specification, click Script actions.
In the textbox, insert the following URL:
Code Block https://raw.githubusercontent.com/trifacta/azure-deploy/master/bin/set-key-permissions.sh
- Save your changes.
Install Software
If you haven't done so already, you can install the
D s item | ||
---|---|---|
|
Configure the Platform
These changes must be applied after the
D s platform |
---|
D s config |
---|
Specify
D s item | ||
---|---|---|
|
Set the Hadoop username for the
D s platform |
---|
D s defaultuser | ||||
---|---|---|---|---|
|
Code Block |
---|
"hdfs.username": "[hadoop.user]", |
Specify location of client distribution bundle JAR
The
D s platform |
---|
/trifacta/hadoop-deps
Configure the bundle distribution property (hadoopBundleJar
):
Code Block |
---|
"hadoopBundleJar": "hadoop-deps/hdp-2.6/build/libs/hdp-2.6-bundle.jar" |
Configure component settings
For each of the following components, please explicitly set the following settings.
D s config Configure Batch Job Runner:
Code Block "batch-job-runner": { "autoRestart": true, ... "classpath": "%(topOfTree)s/hadoop-data/build/install/hadoop-data/hadoop-data.jar:%(topOfTree)s/hadoop-data/build/install/hadoop-data/lib/*:%(topOfTree)s/conf/hadoop-site:/usr/hdp/current/hadoop-client/hadoop-azure.jar:/usr/hdp/current/hadoop-client/lib/azure-storage-2.2.0.jar" },
Configure Diagnostic Server:
"diagnostic-serverAmbari console, please migrate to the following location: HDFS > Configs > Advanced > Advanced Core Site > fs.defaultFS.Code Block Set the value according to the following table:
Azure Storage Layer Protocol (fs.defaultFS) value
config valueD s platform Link Azure Storage
wasbs://<containername>@<accountname>.blob.core.windows.net
"webapp.storageProtocol"
:
"wasbs"
,
See Set Base Storage Layer. Data Lake Store
adl://home
"webapp.storageProtocol"
:
"hdfs"
,
See Set Base Storage Layer. - Save your changes.
Define Script Action for domain-joined clusters
If you are integrating with a domain-joined cluster, you must specify a script action to set some permissions on cluster directories.
For more information, see https://docs.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-configure-using-azure-adds.
Steps:
- In the Advanced Settings tab of the cluster specification, click Script actions.
In the textbox, insert the following URL:
Code Block https://raw.githubusercontent.com/trifacta/azure-deploy/master/bin/set-key-permissions.sh
- Save your changes.
Install Software
If you haven't done so already, you can install the
D s item | ||
---|---|---|
|
Configure the Platform
These changes must be applied after the
D s platform |
---|
D s config |
---|
Specify
D s item | ||
---|---|---|
|
Set the Hadoop username for the
D s platform |
---|
D s defaultuser | ||||
---|---|---|---|---|
|
Code Block |
---|
"hdfs.username": "[hadoop.user]", |
Specify location of client distribution bundle JAR
The
D s platform |
---|
/trifacta/hadoop-deps
Configure the bundle distribution property (hadoopBundleJar
):
Code Block |
---|
"hadoopBundleJar": "hadoop-deps/hdp-2.6/build/libs/hdp-2.6-bundle.jar" |
Configure component settings
For each of the following components, please explicitly set the following settings.
D s config Configure Batch Job Runner:
Code Block "batch-job-runner": { "autoRestart": true, ... "classpath": "%(topOfTree)s/apps/diagnostichadoop-serverdata/build/install/libshadoop-data/diagnostichadoop-serverdata.jar:%(topOfTree)s/apps/diagnostic-server/build/dependencies/hadoop-data/build/install/hadoop-data/lib/*:%(topOfTree)s/conf/hadoop-site:/usr/hdp/current/hadoop-client/hadoop-azure.jar:/usr/hdp/current/hadoop-client/lib/azure-storage-2.2.0.jar" },
Configure the following environment variables:
Code Block "env.PATH": "${HOME}/bin:$PATH:/usr/local/bin:/usr/lib/zookeeper/bin", "env.TRIFACTA_CONF": "/opt/trifacta/conf" "env.JAVA_HOME": "/usr/lib/jvm/java-1.8.0-openjdk-amd64",
Configure the following properties for various
:D s item components components Code Block "ml-service": { "autoRestart": true }, "monitor": { "autoRestart": true, ... "port": <your_cluster_monitor_port> }, "proxy": { "autoRestart": true }, "udf-service": { "autoRestart": true }, "webapp": { "autoRestart": true },
Disable S3 access:
Code Block "aws.s3.enabled": false,
Configure the following Spark Job Service properties:
Code Block "spark-job-service.classpath": "%(topOfTree)s/services/spark-job-server/server/build/libs/spark-job-server-bundle.jar:%(topOfTree)s/conf/hadoop-site/:%(topOfTree)s/services/spark-job-server/build/bundle/*:/usr/hdp/current/hadoop-client/hadoop-azure.jar:/usr/hdp/current/hadoop-client/lib/azure-storage-2.2.0.jar", "spark-job-service.env.SPARK_DIST_CLASSPATH": "/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-mapreduce-client/*",
- Save your changes.
...
Hive integration requires additional configuration.
Info |
---|
NOTE: Natively, HDI supports high availability for Hive via a Zookeeper quorum. |
For more information, see see Configure for Hive.
Configure for Spark Profiling
If you are using Spark for profiling, you must add environment properties to your cluster configuration. See Configure for Spark.
Configure for UDFs
If you are using user-defined functions (UDFs) on your HDInsight cluster, additional configuration is required. See Java UDFs.
Configure Storage
Before you begin running jobs, you must specify your base storage layer, which can be WASB or ADLS. For more information, see Set Base Storage Layer.
...