Page tree

 

Contents:


This section describes how to ensure that the Trifacta® platform is configured correctly to connect to Hive when Sentry is enabled for Hive. Sentry provides role-based authorization for Hive and other Hadoop components on the Cloudera platform.  

Pre-requisites

Before you begin, please verify that your enterprise has deployed both Hive and Sentry according to recommended configuration practices. For more information, please consult the documentation that was provided with your Hadoop distribution.

NOTE: Before you begin, you must integrate the Trifacta platform with Hive. See Configure for Hive.

  1. Enable the Sentry Service. Then, configure Hive to use the Sentry Service. See  http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/sg_sentry_service_config.html#concept_amg_2l2_xq_unique_2.
  2. (recommended) Enable secure impersonation. See below. 

Secure Impersonation with Trifacta platform and Hive with Sentry

Secure impersonation ensures consistent and easily traceable security access to the data stored within your Hadoop cluster.  

NOTE: Although not required, secure impersonation is highly recommended for connecting the platform with Hive.

The Trifacta platform requires the following additional configuration changes to maintain secure impersonation and work with Hive data:  

  1. Enable the platform with secure impersonation.  See Configure for secure impersonation for details.
  2. Give the local Hive user access to the Unix or LDAP group [ldap.group (default=trifactausers)]
  3. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  4. Set the following umask in  trifacta-conf.json:

    "hdfs.permissions.userUmask" = 027,
  5. Verify that the Unix or LDAP group [ldap.group] has read access to the hive warehouse directory as specified in the following section. For more information, see http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/sg_sentry_service_config.html#concept_mlr_qxm_vq_unique_1.
  6. (Optional) Configuring Sentry to sync HDFS permissions will maintain user level access control for the underlying data files. For more information, see http://www.cloudera.com/documentation/enterprise/5-4-x/topics/sg_hdfs_sentry_sync.html

Users and Groups for Sentry

For Sentry, the following definitions and relationships apply.

DefinitionDescription
UserIndividual account, as identified by the underlying authentication system
GroupA set of users maintained by the authentication system
RoleA set of privileges stored as a template to combine multiple access rules
PrivilegeAn instruction or rule allowing access to an object.  Examples of Privileges include access to databases, tables, and the operations that can be executed.

In Sentry:

  • Privileges can only be granted to Roles. 
  • A Group can be assigned to one or more Roles.
    • Users are assigned to a Group through the underlying authentication mechanism (e.g. operating system or LDAP). 

Configuration

NOTE: Before you begin, you should determine the privileges that must be granted to Trifacta users based on your environment and needs.

 

  1. Start Beeline as an administrative user.
  2. Create a role for users of the Trifacta platform:

    CREATE  ROLE trifactaUserRole;
  3. Grant that role to the [ldap.group (default=trifactausers)] group associated with the platform:

    GRANT ROLE trifactaUserRole TO GROUP trifacta;
  4. Grant all privileges to this role for the filesystem area under which platform output is generated. The full URI is required. Example:

    NOTE: Modify the grants as needed for your environment.

    GRANT ALL ON URI 'hdfs://domain_example:8020/trifacta/queryResults/user1@example.com/' to ROLE trifactaUserRole;

    NOTE: If the above URI changes, the above grant must be reapplied to the new URI.

Basic Authentication

When the Trifacta platform is enabled with secure impersonation and submits requests to Hive, the following steps occur:

  1. The platform authenticates as the [hadoop.user] user through Kerberos.
  2. The Hive server authorizes access to the underlying table through Sentry as the Hadoop principal user assigned to the Trifacta user.  

    NOTE: This Hadoop principal is the user that should be configured with appropriate privileges and roles in Sentry.

       

  3. The Hive server executes access to the physical data file on HDFS as the Unix or LDAP user  hive, which should be part of the designated group [hadoop.group (default=trifactausers)].

NOTE: Since Sentry assigns privileges and roles to Unix groups, a common practice is to assign the Hadoop principal users (used by Trifacta users) to dedicated Unix groups that are separate from the Unix group [os.group (default=trifacta)] to use within Sentry. Sentry should not grant any privileges and roles to the Unix group trifacta.

NOTE: In UNIX environments, usernames and group names are case-sensitive. Please verify that you are using the case-sensitive names for users and groups in your Hadoop configuration and Trifacta configuration file.

Verify Operations

After you have completed your configuration changes, you should restart the platform. See Start and Stop the Platform.

To verify platform operations, run a simple job. For more information, see Verify Operations.

Troubleshooting

Cannot publish to Hive in a Kerberized environment with secure impersonation using Sentry

If you have deployed Sentry to manage access to a Kerberized environment using secure impersonation, you may encounter the following error when trying to write your results back to the Hadoop cluster:

NOTE: This issue is known to appear in Cloudera 5.7. It may not appear in later releases.

2015-09-02 20:49:54.111Z - WARN : com.trifacta.dataservice.Controller      : Bad Request: org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL grammar [CREATE TABLE `test_trifacta`  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'hdfs://domain_example:8020/trifacta/queryResults/user1@example.com/test/143/original_98.avro' TBLPROPERTIES ('avro.schema.literal'='{"type":"record","name":"GenericTrifactaRecord","fields":[{"name":"name","type":["null","string"]},{"name":"id","type":["null","string"]},{"name":"id2","type":["null","long"]},{"name":"randomname","type":["null","string"]},{"name":"description","type":["null","string"]},{"name":"dob","type":["null","string"]},{"name":"title","type":["null","string"]},{"name":"corp","type":["null","string"]},{"name":"fixedone","type":["null","long"]},{"name":"fixedtwo","type":["null","long"]}]}')]; nested exception is org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException No valid privileges

Required privileges for this query: Server=server1->URI=hdfs://domain_example:8020/trifacta/queryResults/user1@example.com/test/143/original_98.avro->action=*;

In this case, Sentry is failing to validate the URI permissions to allow the user (user1@example.com) to access the HDFS path, as the permissions have not been specifically granted to the required role. Sentry queries for authorization, fails, and throws the above exception. 

The solution is to grant all access privileges for the user's Sentry role to Trifacta results directory for the target user. In the following example, access is granted to the role2 role:

GRANT ALL ON URI 'hdfs://domain_example:8020/trifacta/queryResults/user1@example.com/' to ROLE role2;

Since permissions in Sentry are recursive through the directories, the target directory for the specific job is covered. For more information on Sentry permissions, See Terminologies section in http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_sg_sentry.html.

This page has no comments.