By default, Microsoft Azure deployments integrate with Azure Data Lake Store (ADLS). Optionally, you can configure your deployment to integrate with WASB.
If the base storage layer has been set to WASB, you can follow these instructions to set up read-only access to ADLS.
NOTE: To enable read-only access to ADLS, do not set the base storage layer to |
Before you integrate with Azure ADLS, you must create the as a registered application. See Configure for Azure.
The following properties should already be specified in the Admin Settings page. Please verify that the following have been set:
azure.applicationId
azure.secret
azure.directoryId
The above properties are needed for this configuration. For more information, see Configure for Azure.
An Azure Key Vault has already been set up and configured for use by the . For more information, see Configure for Azure.
Authentication to ADLS storage is supported for the following modes, which are described in the following section.
Mode | Description | |
---|---|---|
System | All users authenticate to ADLS using a single system key/secret combination. This combination is specified in the following parameters, which you should have already defined:
These properties define the registered application in Azure Active Directory. System authentication mode uses the registered application identifier as the service principal for authentication to ADLS. All users have the same permissions in ADLS. For more information on these settings, see Configure for Azure. | |
User | Per-user mode allows individual users to authenticate to ADLS through their Azure Active Directory login.
|
Steps:
Please complete the following steps to specify the ADLS access mode.
Set the following parameter to the preferred mode (system
or user
):
"azure.adl.mode": "<your_preferred_mode>", |
When access to ADLS is requested, the platform uses the combination of Azure directory ID, Azure application ID, and Azure secret to complete access.
After defining the properties in the , system mode access requires no additional configuration.
In user mode, a user ID hash is generated from the Key Vault key/secret and the user's AD login. This hash is used to generate the access token, which is stored in the Key Vault.
NOTE: User mode access to ADLS requires Single Sign On (SSO) to be enabled for integration with Azure Active Directory. For more information, see Configure SSO for Azure AD. |
In platform configuration, you must define the following properties:
"azure.adl.store": "<your_value_here>", |
This property defines the ADLS storage to which all output data is delivered. Example:
adl://<YOUR_STORE_NAME>.azuredatalakestore.net |
Per earlier configuration:
webapp.storageProtocol
must be set to hdfs
.hdfs.protocolOverride
must be set to adl
.In the , you must configure the following properties for effective communication with HDFS.
"hdfs": { "username": "[hadoop.user]", "enabled": true, "webhdfs": { "httpfs": false, "maprCompatibilityMode": false, "ssl": { "enabled": true, "certificateValidationRequired": false, "certificatePath": "<YOUR_PATH_HERE>" }, "host": "[ADLS].azuredatalakestore.net", "version": "/webhdfs/v1", "proxy": { "host": "proxy", "enabled": false, "port": 8080 }, "credentials": { "username": "[hadoop.user]", "password": "" }, "port": 443 }, "protocolOverride": "adl", "highAvailability": { "serviceName": "[ADLS].azuredatalakestore.net", "namenodes": {} }, "namenode": { "host": "[ADLS].azuredatalakestore.net", "port": 443 } } |
Property | Description |
---|---|
hdfs.username | Set this value to the name of the user that the |
hdfs.enabled | Set to true . |
hdfs.webhdfs.httpfs | Use of HttpFS in this integration is not supported. Set this value to false . |
hdfs.webhdfs.maprCompatibilityMode | This setting does not apply to ADLS. Set this value to false . |
hdfs.webhdfs.ssl.enabled | SSL is always used for ADLS. Set this value to true . |
hdfs.webhdfs.ssl.certificateValidationRequired | Set this value to false . |
hdfs.webhdfs.ssl.certificatePath | This value is not used for ADLS. |
hdfs.webhdfs.host | Set this value to the address of your ADLS datastore. |
hdfs.webhdfs.version | Set this value to /webhdfs/v1 . |
hdfs.webhdfs.proxy.host | This value is not used for ADLS. |
hdfs.webhdfs.proxy.enabled | A proxy is not used for ADLS. Set this value to false . |
hdfs.webhdfs.proxy.port | This value is not used for ADLS. |
hdfs.webhdfs.credentials.username | Set this value to the name of the user that the |
hdfs.webhdfs.credentials.password | Leave this value empty for ADLS. |
hdfs.webhdfs.port | Set this value to 443 . |
hdfs.protocolOverride | Set this value to adl . |
hdfs.highAvailability.serviceName | Set this value to the address of your ADLS datastore. |
hdfs.highAvailability.namenodes | Set this value to an empty value. |
| Set this value to the address of your ADLS datastore. |
hdfs.namenode.port | Set this value to 443 . |
Steps:
Locate the following parameter and change its value to true
:
"azure.adl.enabled": true, |
Configure use of the appropriate Hadoop bundle JAR:
"hadoopBundleJar": "hadoop-deps/hdp-2.6/build/libs/hdp-2.6-bundle.jar", |
Restart services. See Start and Stop the Platform.
After the configuration has been specified, an ADLS connection appears in the Import Data page. Select it to begin navigating for data sources.
Try running a simple job from the . For more information, see Verify Operations.