Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r092


D s platform
 supports access to the following Hadoop storage layers:

  • HDFS

  • S3

Set the base storage layer

At this time, you should define the base storage layer from the platform. 

Excerpt Include
Install Set Base Storage Layer
Install Set Base Storage Layer

Required configuration for each type of storage is described below.



D s platform
 can integrate with an S3 bucket:

  • If you are using HDFS as the base storage layer, you can integrate with S3 for read-only access.
  • Base storage layer requires read-write access.


    NOTE: If you are integrating with S3, additional configuration is required. Instead of completing the HDFS configuration below, please enable read-write access to S3. See S3 Access in the Configuration Guide.


If output files are to be written to an HDFS environment, you must configure the 

D s platform
 to interact with HDFS. 


If your deployment is using HDFS, do not use the trifacta/uploads directory. This directory is used for storing uploads and metadata, which may be used by multiple users. Manipulating files outside of the

D s webapp
can destroy other users' data. Please use the tools provided through the interface for managing uploads from HDFS.


NOTE: Use of HDFS in safe mode is not supported.

Below, replace the value for 

D s defaultuser
 with the value appropriate for your environment. 
D s config

Code Block
"hdfs": {
  "username": "[hadoop.user]",
  "namenode": {
    "host": "",
    "port": 8080
Username in the Hadoop cluster to be used by the 
D s platform
 for executing jobs.
namenode.hostHost name of namenode in the Hadoop cluster. You may reference multiple namenodes.

Port to use to access the namenode. You may reference multiple namenodes.


NOTE: Default values for the port number depend on your Hadoop distribution. See System Ports in the Planning Guide.

Individual users can configure the HDFS directory where exported results are stored.


NOTE: Multiple users cannot share the same home directory.

See Storage Config Page in the User Guide.

Access to HDFS is supported over one of the following protocols:


If you are using HDFS, it is assumed that WebHDFS has been enabled on the cluster. Apache WebHDFS enables access to an HDFS instance over HTTP REST APIs. For more information, see

The following properties can be modified:

Code Block
"webhdfs": {
  "version": "/webhdfs/v1",
  "host": "",
  "port": 50070,
  "httpfs": false

Path to locally installed version of WebHDFS.


NOTE: For version, please leave the default value unless instructed to do otherwise.


Hostname for the WebHDFS service.


NOTE: If this value is not specified, then the expected host must be defined in


Port number for WebHDFS. The default value is 50070.


NOTE: The default port number for SSL to WebHDFS is 50470 .

httpfsTo use HttpFS instead of WebHDFS, set this value to true. The port number must be changed. See HttpFS 202959495 below.


  1. Set to be the hostname of the node that hosts WebHDFS. 
  2. Set webhdfs.port to be the port number over which WebHDFS communicates. The default value is 50070. For SSL, the default value is 50470.
  3. Set webhdfs.httpfs to false.
  4. For hdfs.namenodes, you must set the host and port values to point to the active namenode for WebHDFS.


You can configure the 

D s platform
 to use the HttpFS service to communicate with HDFS, in addition to WebHDFS. 


NOTE: HttpFS serves as a proxy to WebHDFS. When HttpFS is enabled, both services are required.

In some cases, HttpFS is required:

  • High availability requires HttpFS.
  • Your secured HDFS user account has access restrictions.

If your environment meets any of the above requirements, you must enable HttpFS. For more information, see Enable HttpFS in the Configuration Guide.