The supports access to the following Hadoop storage layers:At this time, you should define the base storage layer from the platform. Excerpt Include |
---|
| Install Set Base Storage Layer |
---|
| Install Set Base Storage Layer |
---|
nopanel | true |
---|
|
Required configuration for each type of storage is described below. The can integrate with an S3 bucket:If output files are to be written to an HDFS environment, you must configure the to interact with HDFS. Warning |
---|
If your deployment is using HDFS, do not use the trifacta/uploads directory. This directory is used for storing uploads and metadata, which may be used by multiple users. Manipulating files outside of the can destroy other users' data. Please use the tools provided through the interface for managing uploads from HDFS. |
Info |
---|
NOTE: Use of HDFS in safe mode is not supported. |
Below, replace the value for with the value appropriate for your environment. Code Block |
---|
"hdfs": {
"username": "[hadoop.user]",
...
"namenode": {
"host": "hdfs.example.com",
"port": 8080
},
}, |
Parameter | Description |
---|
username | Username in the Hadoop cluster to be used by the for executing jobs. | namenode.host | Host name of namenode in the Hadoop cluster. You may reference multiple namenodes. | namenode.port | Port to use to access the namenode. You may reference multiple namenodes. Info |
---|
NOTE: Default values for the port number depend on your Hadoop distribution. See System Ports in the Planning Guide. |
|
Individual users can configure the HDFS directory where exported results are stored. Info |
---|
NOTE: Multiple users cannot share the same home directory. |
See Storage Config Page in the User Guide. Access to HDFS is supported over one of the following protocols: WebHDFS If you are using HDFS, it is assumed that WebHDFS has been enabled on the cluster. Apache WebHDFS enables access to an HDFS instance over HTTP REST APIs. For more information, see https://hadoop.apache.org/docs/r1.0.4/webhdfs.html. The following properties can be modified: Code Block |
---|
"webhdfs": {
...
"version": "/webhdfs/v1",
"host": "",
"port": 50070,
"httpfs": false
}, |
Parameter | Description |
---|
version | Path to locally installed version of WebHDFS. Info |
---|
NOTE: For version , please leave the default value unless instructed to do otherwise. |
| host | Hostname for the WebHDFS service. Info |
---|
NOTE: If this value is not specified, then the expected host must be defined in hdfs.namenode.host . |
| port | Port number for WebHDFS. The default value is 50070 . Info |
---|
NOTE: The default port number for SSL to WebHDFS is 50470 . |
| httpfs | To use HttpFS instead of WebHDFS, set this value to true . The port number must be changed. See HttpFS 202959495 below. |
Steps: - Set
webhdfs.host to be the hostname of the node that hosts WebHDFS. - Set
webhdfs.port to be the port number over which WebHDFS communicates. The default value is 50070 . For SSL, the default value is 50470 . - Set
webhdfs.httpfs to false. - For
hdfs.namenodes , you must set the host and port values to point to the active namenode for WebHDFS.
HttpFS You can configure the to use the HttpFS service to communicate with HDFS, in addition to WebHDFS. Info |
---|
NOTE: HttpFS serves as a proxy to WebHDFS. When HttpFS is enabled, both services are required. |
In some cases, HttpFS is required: - High availability requires HttpFS.
- Your secured HDFS user account has access restrictions.
If your environment meets any of the above requirements, you must enable HttpFS. For more information, see Enable HttpFS in the Configuration Guide. |