This section describes how to ensure that the Trifacta® platform is configured correctly to connect to Hive when Sentry is enabled for Hive. Sentry provides role-based authorization for Hive and other Hadoop components on the Cloudera platform.
- For more information, see http://www.cloudera.com.
Before you begin, please verify that your enterprise has deployed both Hive and Sentry according to recommended configuration practices. For more information, please consult the documentation that was provided with your Hadoop distribution.
NOTE: Before you begin, you must integrate the Trifacta platform with Hive. See Configure for Hive.
- Enable the Sentry Service. Then, configure Hive to use the Sentry Service. See http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/sg_sentry_service_config.html#concept_amg_2l2_xq_unique_2.
- (recommended) Enable secure impersonation. See below.
Secure Impersonation with Trifacta platform and Hive with Sentry
Secure impersonation ensures consistent and easily traceable security access to the data stored within your Hadoop cluster.
NOTE: Although not required, secure impersonation is highly recommended for connecting the platform with Hive.
The Trifacta platform requires the following additional configuration changes to maintain secure impersonation and work with Hive data:
- Enable the platform with secure impersonation. See Configure for Secure Impersonation for details.
Give the local Hive user access to the Unix or LDAP group
Set the following umask in
- Verify that the Unix or LDAP group
[ldap.group]has read access to the hive warehouse directory as specified in the following section. For more information, see http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/sg_sentry_service_config.html#concept_mlr_qxm_vq_unique_1.
- (Optional) Configuring Sentry to sync HDFS permissions will maintain user level access control for the underlying data files. For more information, see http://www.cloudera.com/documentation/enterprise/5-4-x/topics/sg_hdfs_sentry_sync.html
Users and Groups for Sentry
For Sentry, the following definitions and relationships apply.
|User||Individual account, as identified by the underlying authentication system|
|Group||A set of users maintained by the authentication system|
|Role||A set of privileges stored as a template to combine multiple access rules|
|Privilege||An instruction or rule allowing access to an object. Examples of Privileges include access to databases, tables, and the operations that can be executed.|
- Privileges can only be granted to Roles.
- A Group can be assigned to one or more Roles.
- Users are assigned to a Group through the underlying authentication mechanism (e.g. operating system or LDAP).
NOTE: Before you begin, you should determine the privileges that must be granted to Trifacta users based on your environment and needs.
NOTE: If you are publishing back to Hive, please verify that one of the following is enabled:
rwxpermissions on the publish directory.
The Hive users must be members of the group that owns the directory.
- Start Beeline as an administrative user.
Create a role for users of the Trifacta platform:
Grant that role to the
]group associated with the platform:
Grant all privileges to this role for the filesystem area under which platform output is generated. The full URI is required. Example:
NOTE: Modify the grants as needed for your environment.
NOTE: If the above URI changes, the above grant must be reapplied to the new URI.
When the Trifacta platform is enabled with secure impersonation and submits requests to Hive, the following steps occur:
- The platform authenticates as the
[hadoop.user]user through Kerberos.
The Hive server authorizes access to the underlying table through Sentry as the Hadoop principal user assigned to the Trifacta user.
NOTE: This Hadoop principal is the user that should be configured with appropriate privileges and roles in Sentry.
- The Hive server executes access to the physical data file on HDFS as the Unix or LDAP user
hive, which should be part of the designated group
NOTE: Since Sentry assigns privileges and roles to Unix groups, a common practice is to assign the Hadoop principal users (used by Trifacta users) to dedicated Unix groups that are separate from the Unix group
] to use within Sentry. Sentry should not grant any privileges and roles to the Unix group
NOTE: In UNIX environments, usernames and group names are case-sensitive. Please verify that you are using the case-sensitive names for users and groups in your Hadoop configuration and Trifacta configuration file.
After you have completed your configuration changes, you should restart the platform. See Start and Stop the Platform.
To verify platform operations, run a simple job. For more information, see Verify Operations.
Cannot publish to Hive in a Kerberized environment with secure impersonation using Sentry
If you have deployed Sentry to manage access to a Kerberized environment using secure impersonation, you may encounter the following error when trying to write your results back to the Hadoop cluster:
NOTE: This issue is known to appear in Cloudera 5.7. It may not appear in later releases.
In this case, Sentry is failing to validate the URI permissions to allow the user (
firstname.lastname@example.org) to access the HDFS path, as the permissions have not been specifically granted to the required role. Sentry queries for authorization, fails, and throws the above exception.
The solution is to grant all access privileges for the user's Sentry role to Trifacta results directory for the target user. In the following example, access is granted to the
Since permissions in Sentry are recursive through the directories, the target directory for the specific job is covered. For more information on Sentry permissions, See Terminologies section in http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_sg_sentry.html.
This page has no comments.