Page tree

Release 9.2


Contents:

   

Configure ResourceManager settings

Configure the following:

"yarn.resourcemanager.host": "hadoop",
"yarn.resourcemanager.port": 8032,

NOTE: Do not modify the other host/port settings unless you have specific information requiring the modifications.

For more information, see System Ports in the Planning Guide.

Specify distribution client bundle

The  Designer Cloud powered by Trifacta platform  ships with client bundles supporting a number of major Hadoop distributions.  You must configure the jarfile for the distribution to use.  These distributions are stored in the following directory:

/opt/trifacta/hadoop-deps

Configure the bundle distribution property (hadoopBundleJar) in platform configuration. Examples:

Hadoop DistributionhadoopBundleJar property value
Cloudera

"hadoop-deps/cdh-x.y/build/libs/cdh-x.y-bundle.jar"

Cloudera Data Platform"hadoop-deps/cdp-x.y.z/build/libs/cdp-x.y.z-bundle.jar"

where:

x.y is the major-minor build number (e.g. 5.4)

NOTE: The path must be specified relative to the install directory.

Tip: If there is no bundle for the distribution you need, you might try the one that is the closest match in terms of Apache Hadoop baseline. For example, CDH5 is based on Apache 2.3.0, so that client bundle will probably run ok against a vanilla Apache Hadoop 2.3.0 installation. For more information, see Alteryx Support.

Cloudera distribution

Some additional configuration is required. See Configure for Cloudera in the Configuration Guide.

Default Hadoop job results format

For smaller datasets, the platform recommends using the Trifacta Photon running environment.

For larger datasets, if the size information is unavailable, the platform recommends by default that you run the job on the Hadoop cluster. For these jobs, the default publishing action for the job is specified to run on the Hadoop cluster, generating the output format defined by this parameter. Publishing actions, including output format, can always be changed as part of the job specification. 

As needed, you can change this default format. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

"webapp.defaultHadoopFileFormat": "csv",

Accepted values: csvjsonavropqt

For more information, see Run Job Page in the User Guide.

This page has no comments.