Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Excerpt

Pre-requisites for Kerberos integration

Before you begin, please verify the following:

  1. The

    D s defaultuser
    Typehadoop
    Fulltrue
    user is created and enabled on each node in the Hadoop cluster.

    Info

    NOTE: If LDAP is enabled, the

    D s defaultuser
    Typehadoop
    Valuetrue
    should be created in the same realm as the cluster.

  2. On the 

    D s item
    itemhost
    , the directory /opt/trifacta is owned by the
    D s defaultuser
    Typehadoop
    user
    .

  3. The

    D s defaultuser
    Typehadoop
    user exists on each node in the Hadoop cluster.

    Info

    NOTE: The

    D s defaultuser
    Typehadoop
    must have the same user ID and group ID on each node in the cluster. Depending on your cluster's configuration, this requirement may require an LDAP command. Configuring LDAP is beyond the scope of this document.

  4. The
    D s defaultuser
    Typehadoop
    user must be a member of any special group that is permitted to access HDFS or to run Hadoop jobs.

Configure the KDC

Steps:

  1. On your KDC node, configure a Kerberos principal for the 
    D s platform
    :
    1. The principal's identifier has two parts: its name and its realm. For example, the principal trifacta@HADOOPVAL.MSSVC.LOCAL has the name trifacta and the realm HADOOPVAL.MSSVC.LOCAL
    2. Retain the name and principal for later configuration.
  2. Create a keytab file for the 

    D s item
    itemprincipal
    . Command:

    Code Block
    kadmin xst -k trifacta.keytab <full principal identifier>

    where:
    <full_principal_identifier> is the principal identifier in Kerberos. 

    Warning

    On the KDC, you may have to run kadmin.local instead of kadmin. The rest of the arguments should remain the same.

    Info

    NOTE: If you're creating a keytab file in an AD environment, alternative instructions may need to be applied. See below.

  3. Verify that the keytab is working. Command:

    Code Block
    klist -e -k -t trifacta.keytab
  4. Copy the keytab to the 
    D s item
    itemnode
     in the following directory: 
    /opt/trifacta/conf/trifacta.keytab
  5. Configure the keytab file so that it is owned by the

    D s defaultuser
    Typehadoop
    user. It should only be readable by that user.

    Info

    NOTE: Verify that all user principals that use the platform are also members of the group of the keytab user.

Create keytab in Active Directory environments

Some additional instructions are provided for the following environments.

For MIT Kerberos

See https://kb.iu.edu/d/aumh:

Code Block
> ktutil
  ktutil:  addent -password -p username@EXAMPLE.COM -k 1 -e rc4-hmac
  Password for username@EXAMPLE.COM: [enter your password]
  ktutil:  addent -password -p username@EXAMPLE.COM -k 1 -e aes256-cts
  Password for username@EXAMPLE.COM: [enter your password]
  ktutil:  wkt username.keytab
  ktutil:  quit 

For Hiemdal Kerberos

Code Block
  > ktutil -k username.keytab add -p username@EXAMPLE.COM -e arcfour-hmac-md5 -V 1

If the keytab created in Heimdal does not work, you may need an aes256-cts entry. In this case, locate a machine with MIT Kerberos, and use the MIT Kerberos method instead.

Enable use of Kerberos keytab by Command Line Interface

The 

D s item
itemCommand Line Interface
 can reference a Kerberos keytab file to enable access to the platform without supplying passwords. After you have created the keytab, please complete the following steps.

Info

NOTE: You must create a separate keytab file for the CLI. This keytab file must be created for the platform user that is connecting to the platform through the CLI, which is different from the

D s defaultuser
Typehadoop
user keytab that is used by the platform to connect to the cluster.

 

Steps:

  1. Export the following environment variables, where username corresponds to the user ID that is to be used to connect to the platform:

    Code Block
    export KRB5_CLIENT_KTNAME=/path/to/the/username.keytab
    export GSS_KRB5_NT_PRINCIPAL_NAME=username
  2. Invoke the command line interface commands without supplying a password parameter. For more information, see Command Line Interface.

Configure the 
D s platform
for Kerberos

D s config

Locate the kerberos section, which controls Kerberos authentication.

Example configuration:

Substitute your own values in place of the example values as appropriate.

Code Block
"kerberos.enabled": true,
"kerberos.principal": "trifacta",
"kerberos.kdc": "kdc.mssvc.local",
"kerberos.realm": "HADOOPVAL.MSSVC.LOCAL",
"kerberos.keytab": "/opt/trifacta/conf/trifacta.keytab"
"kerberos.principals.hive": "<YOUR_VALUE_HERE><UNUSED>",
"kerberos.principals.namenode": "nn/_HOST@EXAMPLE.COM"
"kerberos.principals.resourcemanager": "<YOUR_VALUE_HERE>",
ParameterDescription
enabledTo enable Kerberos authentication, set this value to true.
principalThe name part of the principal you created in the KDC
kdcThe host of the KDC
realmRealm of the KDC
keytab
Directory in the
D s item
itemdeployment
where the Kerberos keytab file is stored
principals

List of jobtrackers and namenodes that are governed by Kerberos

Info

NOTE: kerberos.principals.hive is unused. This value must be inserted into the Hive connection definition. See Create Hive Connections.

Info

NOTE: If you don't know the values to use here, see Set principal values below.

Info

NOTE: If you don't specify principal names in the principals definition section, the default names are used: mapred/<jobtracker host>@<realm>. You should specify the principals explicitly.

At this point, you should be able to load files from HDFS and run jobs against the kerberized Hadoop cluster.

Set principal values for YARN

Check the following Hadoop config properties in yarn-site.xml:

Code Block
principals.jobtracker = yarn.resourcemanager.principal
principals.namenode = dfs.namenode.kerberos.principal