Contents:
This section describes how to ensure that the Designer Cloud Powered by Trifacta® platform is configured correctly to connect to Hive when Sentry is enabled for Hive. Sentry provides role-based authorization for Hive and other Hadoop components on the Cloudera platform.
- For more information, see http://www.cloudera.com.
Before you begin, please verify that your enterprise has deployed both Hive and Sentry according to recommended configuration practices. For more information, please consult the documentation that was provided with your Hadoop distribution. NOTE: Before you begin, you must integrate the Designer Cloud Powered by Trifacta platform with Hive. See Configure for Hive. Secure impersonation ensures consistent and easily traceable security access to the data stored within your Hadoop cluster. NOTE: Although not required, secure impersonation is highly recommended for connecting the platform with Hive. The Designer Cloud Powered by Trifacta platform requires the following additional configuration changes to maintain secure impersonation and work with Hive data: You can apply this change through the Admin Settings Page (recommended) or Set the following umask in For Sentry, the following definitions and relationships apply. In Sentry: NOTE: Before you begin, you should determine the privileges that must be granted to Alteryx users based on your environment and needs. Create a role for users of the Designer Cloud Powered by Trifacta platform: Grant that role to the Grant all privileges to this role for the filesystem area under which platform output is generated. The full URI is required. Example: NOTE: Modify the grants as needed for your environment. NOTE: If the above URI changes, the above grant must be reapplied to the new URI. When the Designer Cloud Powered by Trifacta platform is enabled with secure impersonation and submits requests to Hive, the following steps occur: The Hive server authorizes access to the underlying table through Sentry as the Hadoop principal user assigned to the Alteryx user. NOTE: This Hadoop principal is the user that should be configured with appropriate privileges and roles in Sentry. NOTE: Since Sentry assigns privileges and roles to Unix groups, a common practice is to assign the Hadoop principal users (used by Alteryx users) to dedicated Unix groups that are separate from the Unix group NOTE: In UNIX environments, usernames and group names are case-sensitive. Please verify that you are using the case-sensitive names for users and groups in your Hadoop configuration and Alteryx configuration file.Pre-requisites
Secure Impersonation with Designer Cloud Powered by Trifacta platform and Hive with Sentry
[ldap.group
(default=trifactausers
)]
. trifacta-conf.json
.
For more information, see Platform Configuration Methods.trifacta-conf.json
:"hdfs.permissions.userUmask" = 027,
[ldap.group]
has read access to the hive warehouse directory as specified in the following section. For more information, see http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/sg_sentry_service_config.html#concept_mlr_qxm_vq_unique_1.Users and Groups for Sentry
Definition Description User Individual account, as identified by the underlying authentication system Group A set of users maintained by the authentication system Role A set of privileges stored as a template to combine multiple access rules Privilege An instruction or rule allowing access to an object. Examples of Privileges include access to databases, tables, and the operations that can be executed. Configuration
CREATE ROLE trifactaUserRole;
[ldap.group
(default=trifactausers
)]
group associated with the platform:GRANT ROLE trifactaUserRole TO GROUP trifacta;
GRANT ALL ON URI 'hdfs://domain_example:8020/trifacta/queryResults/user1@example.com/' to ROLE trifactaUserRole;
Basic Authentication
[hadoop.user]
user through Kerberos.hive
, which should be part of the designated group [hadoop.group
(default=trifactausers
)]
.
[os.group
(default=trifacta
)]
to use within Sentry. Sentry should not grant any privileges and roles to the Unix group trifacta
.
Verify Operations
After you have completed your configuration changes, you should restart the platform. See Start and Stop the Platform.
To verify platform operations, run a simple job. For more information, see Verify Operations.
Troubleshooting
Cannot publish to Hive in a Kerberized environment with secure impersonation using Sentry
If you have deployed Sentry to manage access to a Kerberized environment using secure impersonation, you may encounter the following error when trying to write your results back to the Hadoop cluster:
NOTE: This issue is known to appear in Cloudera 5.7. It may not appear in later releases.
2015-09-02 20:49:54.111Z - WARN : com.trifacta.dataservice.Controller : Bad Request: org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL grammar [CREATE TABLE `test_trifacta` ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'hdfs://domain_example:8020/trifacta/queryResults/user1@example.com/test/143/original_98.avro' TBLPROPERTIES ('avro.schema.literal'='{"type":"record","name":"GenericTrifactaRecord","fields":[{"name":"name","type":["null","string"]},{"name":"id","type":["null","string"]},{"name":"id2","type":["null","long"]},{"name":"randomname","type":["null","string"]},{"name":"description","type":["null","string"]},{"name":"dob","type":["null","string"]},{"name":"title","type":["null","string"]},{"name":"corp","type":["null","string"]},{"name":"fixedone","type":["null","long"]},{"name":"fixedtwo","type":["null","long"]}]}')]; nested exception is org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException No valid privileges Required privileges for this query: Server=server1->URI=hdfs://domain_example:8020/trifacta/queryResults/user1@example.com/test/143/original_98.avro->action=*;
In this case, Sentry is failing to validate the URI permissions to allow the user (user1@example.com
) to access the HDFS path, as the permissions have not been specifically granted to the required role. Sentry queries for authorization, fails, and throws the above exception.
The solution is to grant all access privileges for the user's Sentry role to Alteryx results directory for the target user. In the following example, access is granted to the role2
role:
GRANT ALL ON URI 'hdfs://domain_example:8020/trifacta/queryResults/user1@example.com/' to ROLE role2;
Since permissions in Sentry are recursive through the directories, the target directory for the specific job is covered. For more information on Sentry permissions, See Terminologies section in http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_sg_sentry.html.
This page has no comments.