Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0642


Configure for Spark Profiling


Disable intermediate caching for Hive profiling jobs

For Hortonworks 3.0 and later, the intermediate dataset files that are generated as part of Spark profiling of your job can cause the job to hang when the source is a Hive table. As a precaution, if you are profiling jobs from Hive sources, you should disable the following property on Hortonworks 3.0 and later.


  1. D s config
  2. Locate the spark.props setting.
  3. Insert the following setting:

    Code Block
    "transformer.dataframe.cache.reused": "false"
  4. Save your changes and restart the platform.

Additional configuration for Spark profiling on S3


  1. In Ambari, navigate to Spark2 > Configs.
  2. Add a new parameter to Custom Spark2-defaults.
  3. Set the parameter as follows, which is specified for HDP, build 37:

    Code Block
  4. Restart Spark from Ambari.
  5. Restart the 
    D s platform

Additional configuration for Spark profiling

If you are using Spark for profiling, you must add environment properties to your cluster configuration. See Configure for Spark.


Set up directory permissions

On all Hortonworks cluster nodes, verify that the YARN user has access to the YARN working directories: 

Code Block
chown yarn:hadoop /mnt/hadoop/yarn

If you are upgrading from a previous version of Hortonworks, you may need to clear the YARN user cache for the

D s defaultuser

Code Block
rm -rf /mnt/hadoop/yarn/local/usercache/trifacta

D s platform

The following changes need to be applied to the 

D s node

Except as noted, these changes are applied to the following file in the 

D s item

D s triconf

Configure WebHDFS port

  1. D s config

  2.  WebHDFS: Verify that the port number for WebHDFS is correct:

    Code Block
    "webhdfs.port": <webhdfs_port_num>,
  3. Save your changes.

Configure Resource Manager port

Hortonworks uses a custom port number for Resource Manager. You must update the setting for the port number used by Resource Manager.

D s config


NOTE: By default, Hortonworks uses 8050 for Resource Manager. Please verify that you have the correct port number.

Code Block
"yarn.resourcemanager.port": 8032,

Save your changes.

Configure location of Hadoop bundle JAR

  1. Set the value for the Hadoop bundle JAR to the appropriate distribution. The following is for Hortonworks 2.6:

    Code Block
    "hadoopBundleJar": "hadoop-deps/hdp-2.6/build/libs/hdp-2.6-bundle.jar"
  2. Save your changes.

Configure Hive Locations

If you are enabling an integration with Hive on the Hadoop cluster, there are some distribution-specific parameters that must be set. For more information, see Configure for Hive.


To apply your changes, restart the platform. See Start and Stop the Platform.