Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r092
Excerpt

Configure ResourceManager settings

Configure the following:

Code Block
"yarn.resourcemanager.host": "hadoop",
"yarn.resourcemanager.port": 8032,
Info

NOTE: Do not modify the other host/port settings unless you have specific information requiring the modifications.

For more information, see System Ports in the Planning Guide.

Specify distribution client bundle

The 

D s platform
 ships with client bundles supporting a number of major Hadoop distributions.  You must configure the jarfile for the distribution to use.  These distributions are stored in the following directory:

/opt/trifacta/hadoop-deps

Configure the bundle distribution property (hadoopBundleJar) in platform configuration. Examples:

Hadoop DistributionhadoopBundleJar property value
Cloudera

"hadoop-deps/cdh-x.y/build/libs/cdh-x.y-bundle.jar"

Cloudera Data Platform"hadoop-deps/cdp-x.y.z/build/libs/cdp-x.y.z-bundle.jar"
Hortonworks"hadoop-deps/hdp-x.y/build/libs/hdp-x.y-bundle.jar"

where:

x.y is the major-minor build number (e.g. 5.4)

Info

NOTE: The path must be specified relative to the install directory.

Tip

Tip: If there is no bundle for the distribution you need, you might try the one that is the closest match in terms of Apache Hadoop baseline. For example, CDH5 is based on Apache 2.3.0, so that client bundle will probably run ok against a vanilla Apache Hadoop 2.3.0 installation. For more information, see

D s support
.

Cloudera distribution

Some additional configuration is required. See Configure for Cloudera in the Configuration Guide.

Hortonworks distribution

After install, integration with the Hortonworks Data Platform requires additional configuration. See Configure for Hortonworks in the Configuration Guide.

Default Hadoop job results format

For smaller datasets, the platform recommends using the 

D s photon
 running environment.

For larger datasets, if the size information is unavailable, the platform recommends by default that you run the job on the Hadoop cluster. For these jobs, the default publishing action for the job is specified to run on the Hadoop cluster, generating the output format defined by this parameter. Publishing actions, including output format, can always be changed as part of the job specification. 

As needed, you can change this default format. 

D s config

Code Block
"webapp.defaultHadoopFileFormat": "csv",

Accepted values: csvjsonavropqt

For more information, see Run Job Page in the User Guide.