Microsoft Azure deployments can integrate with with the next generation of Azure Data Lake Store (ADLS Gen2).

Limitations of ADLS Gen2 Integration

Read-only access

If the base storage layer has been set to WASB, you can follow these instructions to set up read-only access to ADLS Gen2. 

NOTE: To enable read-only access to ADLS Gen2, do not set the base storage layer to abfss.

Pre-requisites

General

Create a registered application

Before you integrate with Azure ADLS Gen2, you must create the  as a registered application. See Configure for Azure.

Azure properties

The following properties should already be specified in the Admin Settings page. Please verify that the following have been set:

The above properties are needed for this configuration.

Tip: ADLS Gen2 also works if you are using Azure Managed Identity.

Registered application role

NOTE: The Storage Blob Data Contributor role or its equivalent roles must be assigned in the ADLS Gen2 storage account.

For more information, see Configure for Azure.

Key Vault Setup

An Azure Key Vault has already been set up and configured for use by the . Properties must be specified in the platform, if they have not been configured already.

For more information on configuration for Azure key vault, see Configure for Azure.

Configure the 

Define base storage layer

Per earlier configuration:

See Set Base Storage Layer.

Review Java VFS Service

Use of ADLS Gen2 requires the Java VFS service in the .

NOTE: This service is enabled by default.

For more information on configuring this service, see Configure Java VFS Service.

Configure file storage protocols and locations

The  must be provided the list of protocols and locations for accessing ADLS Gen2 blob storage. 

Steps:

  1. Locate the following parameters and set their values according to the table below:

    "fileStorage.whitelist": ["abfss"],
    "fileStorage.defaultBaseUris": ["abfss://filesystem@storageaccount.dfs.core.windows.net/"],


    ParameterDescription
    filestorage.whitelist

    A comma-separated list of protocols that are permitted to read and write with ADLS Gen2 storage.

    NOTE: The protocol identifier "abfss" must be included in this list.


    filestorage.defaultBaseUris

    For each supported protocol, this param must contain a top-level path to the location where files can be stored. These files include uploads, samples, and temporary storage used during job execution.

    NOTE: A separate base URI is required for each supported protocol. You may only have one base URI for each protocol.



  2. Save your changes and restart the platform.

Configure access mode

ModeDescription
System

All users authenticate to ADLS using a single system key/secret combination. This combination is specified in the following parameters, which you should have already defined:

  • azure.applicationId
  • azure.secret
  • azure.directoryId

These properties define the registered application in Azure Active Directory. System authentication mode uses the registered application identifier as the service principal for authentication to ADLS. All users have the same permissions in ADLS.

For more information on these settings, see Configure for Azure.

User

In user mode, per-user access is governed by Azure AD SSO. A set of tokens is acquired during SSO login for the user and is stored in the Azure Key Vault against the user's masked identifier.

Additional configuration is required. See below.

System mode access

When access to ADLS Gen2 is requested, the platform uses the combination of Azure directory ID, Azure application ID, and Azure secret to complete access.

Steps:

Please verify the following steps to specify the ADLS access mode.

  1. Verify that the following parameter to system:

    "azure.adlsgen2.mode": "system",


  2. Save your changes.

User mode access

In user mode, a set of tokens is acquired during SSO login for the user and is stored in the Azure Key Vault against the user's masked identifier.

Pre-requisites:

Steps:

Please verify the following steps to specify the ADLS access mode.

  1. Set the following parameter to user:

    "azure.adlsgen2.mode": "user",


  2. Save your changes.

Testing

Restart services. See Start and Stop the Platform.

After the configuration has been specified, an ADLS Gen2 connection appears in the Import Data page. Select it to begin navigating for data sources.

NOTE: If you have multiple ADLS Gen2 file systems or storage accounts, you can access the secondary ones through the ADLS Gen2 browser. Edit the URL path in the browser and paste in the URI for other locations.

Try running a simple job from the . For more information, see Verify Operations.

Troubleshooting

Problem: SSLHandshakeException : Unsupported curveId: 29 error when retrieving Databricks token

This issue is caused by the  sending a known set of elliptic curve algorithms to Microsoft during SSL handshake, but an unsupported curve algorithm is being negotiated and used by the Microsoft server. 

A similar issue is described here: https://bugs.openjdk.java.net/browse/JDK-8171279

Solution:

Microsoft should fix the problem.

Within the , you can apply the following workaround:

NOTE: This solution disables the use of the listed algorithms for all Java services installed on the and is satisfactory for all Java services of the .

  1. Login to the  as an administrator.
  2. Edit the following file:

    $JAVA_HOME/jre/lib/security/java.security
    


  3. Locate the following parameter: jdk.tls.disabledAlgorithms.
  4. To the above parameter, add the following algorithm references to disable them:

    TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
    


  5. Save your changes and restart the platform.