Skip to main content

Configure for Hive with Sentry

This section describes how to ensure that the Designer Cloud Powered by Trifacta platform is configured correctly to connect to Hive when Sentry is enabled for Hive. Sentry provides role-based authorization for Hive and other Hadoop components on the Cloudera platform.

Prerequisites

Warning

Before you begin, please verify that your enterprise has deployed both Hive and Sentry according to recommended configuration practices. For more information, please consult the documentation that was provided with your Hadoop distribution.

Note

Before you begin, you must integrate the Designer Cloud Powered by Trifacta platform with Hive. See Configure for Hive.

  1. Enable the Sentry Service. Then, configure Hive to use the Sentry Service. See http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/sg_sentry_service_config.html#concept_amg_2l2_xq_unique_2.

  2. (recommended) Enable secure impersonation. See below.

Secure Impersonation with Designer Cloud Powered by Trifacta platform and Hive with Sentry

Secure impersonation ensures consistent and easily traceable security access to the data stored within your Hadoop cluster.

Note

Although not required, secure impersonation is highly recommended for connecting the platform with Hive.

The Designer Cloud Powered by Trifacta platform requires the following additional configuration changes to maintain secure impersonation and work with Hive data:

  1. Enable the platform with secure impersonation. See Configure for Secure Impersonation for details.

  2. Give the local Hive user access to the Unix or LDAP group [ldap.group (default=trifactausers)].

  3. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  4. Set the following umask in trifacta-conf.json:

    "hdfs.permissions.userUmask" = 027,
  5. Verify that the Unix or LDAP group [ldap.group] has read access to the hive warehouse directory as specified in the following section. For more information, see http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/sg_sentry_service_config.html#concept_mlr_qxm_vq_unique_1.

  6. (Optional) Configuring Sentry to sync HDFS permissions will maintain user level access control for the underlying data files. For more information, see http://www.cloudera.com/documentation/enterprise/5-4-x/topics/sg_hdfs_sentry_sync.html

Users and Groups for Sentry

For Sentry, the following definitions and relationships apply.

Definition

Description

User

Individual account, as identified by the underlying authentication system

Group

A set of users maintained by the authentication system

Role

A set of privileges stored as a template to combine multiple access rules

Privilege

An instruction or rule allowing access to an object. Examples of Privileges include access to databases, tables, and the operations that can be executed.

In Sentry:

  • Privileges can only be granted to Roles.

  • A Group can be assigned to one or more Roles.

    • Users are assigned to a Group through the underlying authentication mechanism (e.g. operating system or LDAP).

Configuration

Note

Before you begin, you should determine the privileges that must be granted to Alteryx users based on your environment and needs.

Note

If you are publishing back to Hive, please verify that one of the following is enabled:

  1. The hive:hive user has rwx permissions on the publish directory.

  2. The Hive users must be members of the group that owns the directory.

Steps:

  1. Start Beeline as an administrative user.

  2. Create a role for users of the Designer Cloud Powered by Trifacta platform:

    CREATE  ROLE trifactaUserRole;
  3. Grant that role to the [ldap.group (default=trifactausers)] group associated with the platform:

    GRANT ROLE trifactaUserRole TO GROUP trifacta;
  4. Grant all privileges to this role for the filesystem area under which platform output is generated. The full URI is required. Example:

    Note

    Modify the grants as needed for your environment.

    GRANT ALL ON URI 'hdfs://domain_example:8020/trifacta/queryResults/user1@example.com/' to ROLE trifactaUserRole;

    Note

    If the above URI changes, the above grant must be reapplied to the new URI.

Basic Authentication

When the Designer Cloud Powered by Trifacta platform is enabled with secure impersonation and submits requests to Hive, the following steps occur:

  1. The platform authenticates as the [hadoop.user] user through Kerberos.

  2. The Hive server authorizes access to the underlying table through Sentry as the Hadoop principal user assigned to the Alteryx user.

    Note

    This Hadoop principal is the user that should be configured with appropriate privileges and roles in Sentry.

  3. The Hive server executes access to the physical data file on HDFS as the Unix or LDAP user hive, which should be part of the designated group [hadoop.group (default=trifactausers)].

Note

Since Sentry assigns privileges and roles to Unix groups, a common practice is to assign the Hadoop principal users (used by Alteryx users) to dedicated Unix groups that are separate from the Unix group [os.group (default=trifacta)] to use within Sentry. Sentry should not grant any privileges and roles to the Unix group trifacta.

Note

In UNIX environments, usernames and group names are case-sensitive. Please verify that you are using the case-sensitive names for users and groups in your Hadoop configuration and Alteryx configuration file.

Verify Operations

After you have completed your configuration changes, you should restart the platform. See Start and Stop the Platform.

To verify platform operations, run a simple job. For more information, see Verify Operations.

Troubleshooting

Cannot publish to Hive in a Kerberized environment with secure impersonation using Sentry

If you have deployed Sentry to manage access to a Kerberized environment using secure impersonation, you may encounter the following error when trying to write your results back to the Hadoop cluster:

Note

This issue is known to appear in Cloudera 5.7. It may not appear in later releases.

2015-09-02 20:49:54.111Z - WARN : com.trifacta.dataservice.Controller      : Bad Request: org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL grammar [CREATE TABLE `test_trifacta`  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'hdfs://domain_example:8020/trifacta/queryResults/user1@example.com/test/143/original_98.avro' TBLPROPERTIES ('avro.schema.literal'='{"type":"record","name":"GenericTrifactaRecord","fields":[{"name":"name","type":["null","string"]},{"name":"id","type":["null","string"]},{"name":"id2","type":["null","long"]},{"name":"randomname","type":["null","string"]},{"name":"description","type":["null","string"]},{"name":"dob","type":["null","string"]},{"name":"title","type":["null","string"]},{"name":"corp","type":["null","string"]},{"name":"fixedone","type":["null","long"]},{"name":"fixedtwo","type":["null","long"]}]}')]; nested exception is org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException No valid privileges

Required privileges for this query: Server=server1->URI=hdfs://domain_example:8020/trifacta/queryResults/user1@example.com/test/143/original_98.avro->action=*;

In this case, Sentry is failing to validate the URI permissions to allow the user (user1@example.com) to access the HDFS path, as the permissions have not been specifically granted to the required role. Sentry queries for authorization, fails, and throws the above exception.

The solution is to grant all access privileges for the user's Sentry role to Alteryx results directory for the target user. In the following example, access is granted to the role2 role:

GRANT ALL ON URI 'hdfs://domain_example:8020/trifacta/queryResults/user1@example.com/' to ROLE role2;

Since permissions in Sentry are recursive through the directories, the target directory for the specific job is covered. For more information on Sentry permissions, See Terminologies section in http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_sg_sentry.html.