This section describes the basics of creating a Hive connection and modifying it for specific aspects of your Hadoop environment.
For more information on supported versions of Hive, see Configure for Hive.
NOTE: The platform supports a single, global connection to Hive. All users must use this connection. |
For more information on how the platform works with Hive, see Using Hive.
Before you create your Hive connection, you must create and reference an encryption key file. If you have not created one already, see Create Encryption Key File.
You can create the connection through the application, the command line interface, or the APIs.
NOTE: The following configuration methods apply to creation of an insecure connection to a Hive 1.x instance. If you are applying security, using HTTP, or connecting to Hive 2.x, additional configuration is required. Before creating these connections, please review the Additional Configuration Options section below. |
A can create the Hive connection through the application.
Steps:
Specify the properties for your Hive database.
Before you create the connection, you should review the rest of this section to verify that you are using the proper connection string options for the Hive connection to work in your environment.
NOTE: If secure impersonation is enabled on your cluster, you must include the Hive principal as part of your connection string. For more information, see Configure for Hive. |
For more information, see Create Connection Window.
Click Save.
This connection can also be created through the . Notes on creating the connection:
The CLI tools are stored in the following directory:
/opt/trifacta/bin/ |
Example command (all one command):
./trifacta_cli.py create_connection --user_name <trifacta_admin_username> --password <trifacta_admin_password> --conn_type hadoop_hive --conn_name aHiveConnection --conn_description "This is my Hive connection." --conn_host example.com --conn_port 10000 --conn_credential_type trifacta_service --conn_params_location ~/.trifacta/p.json --conn_is_global --cli_output_path ./conn_create.out |
Parameter | Description | |
---|---|---|
create_connection | CLI action type. Please leave this value as create_connection . | |
--user_name
| Username of the For this connection type, it must be an admin user account. | |
--password | Password of the account. | |
--conn_type | The type of connection. Set this value to hadoop_hive . | |
--conn_name | The internal name of this connection.
| |
--conn_description | A user-friendly description for this connection appears in the application. | |
--conn_host | Host of the Hive instance. | |
--conn_port | Port number of the Hive instance. TCP connection: HTTP connection:
| |
--conn_credential_type | The type of credentials to use. Set this value to trifacta_service for Hive. | |
--conn_credential_location
| For | |
--conn_params_location | Path to the file containing the parameters to pass to Hive during interactions. See below. | |
--conn_ is_global
| This flag is required. It makes the connection available to all
| |
--conn_skip_test | By default, any connection is tested as part of a create or edit action. Include this flag to skip testing the connection. This flag requires no value.
| |
--cli_output_path | The path to the file where results of this command are written.
|
Example Params file:
A parameter file containing the following information must be accessible within the . Please add a reference to this file in your command.
Some deployment scenarios require that specific values be passed into the |
Example:
{ "defaultDatabase":"default", "jdbc": "hive2" } |
For more information on the , see CLI for Connections.
For more information, see API Connections Create v3.
By default, the utilizes TCP to connect to Hive. As needed, the platform can be configured to interact with HiveServer2 over HTTP.
NOTE: HTTP connections are enabled by inserting additional parameters as part of the connection string. In some environments, the JDBC drive may automatically insert these parameters, based on configuration in |
NOTE: If you are planning to use SSL, additional configuration is required. See below. |
Steps:
To enable the to use HTTP when connecting to Hive, please do the following.
Create a params file for your Hive connection. This file must contain at least the following entry:
"connectStrOpts": ";transportMode=http;httpPath=cliservice" |
--conn_port
= 10001
.Execute the command.
Test the Hive connection.
If it's not working, delete the connection and try again.
NOTE: Cloudera supports an additional method for communicating over SSL with Hive. For more information on how to identify the method used by your Cloudera cluster, see Configure for Cloudera. |
The steps below describe how to enable the SASL-QOP method of SASL (Simple Authentication and Security Layer) communications. To enable, you must add an additional connection string parameter.
Steps:
Create or edit the params file for your connection to Hive. This file must include the following setting for connectStrOpts
:
{ "connectStrOpts": ";sasl.qop=auth-conf", "defaultDatabase": "default", "jdbc": "hive2" } |
--conn_port
= 10001
.Execute the command.
Test the Hive connection.
If it's not working, delete the connection and try again.
NOTE: Hive 2.x connections using Zookeeper Quorum are supported on HDP 2.6 only. |
Steps:
Paste it into a text editor. It should look something like the following:
jdbc:hive2://hdp26-w-1.c.example:2181,hdp26-w-3.c.example:2181,hdp26-w-2.c.example:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2 |
Host: Specify the first host value in the URL. In the above, it is this value:
hdp26-w-1.c.example:2181,hdp26-w-3.c.example:2181,hdp26-w-2.c.example |
Port Number: In the above, it is this value:
2181 |
Connect String Options: In the above, it is this value:
;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2 |
For a Kerberos-enabled cluster, you must include the Kerberos principal value as part of the Hive connection string options.
Please complete the following steps to update the Kerberos principal value for Hive into your Hive connection.
Steps:
Retrieve the Kerberos principal value for your deployment.
Create connection through CLI: Create or edit the params file for your connection to Hive. This file must include the following setting for connectStrOpts
:
{ "connectStrOpts": ";principal=hive/<principal_value>", "defaultDatabase": "default", "jdbc": "hive2" } |
where:<principal_value>
is the Kerberos principal (<host>@<realm>) for Hive on your cluster.
For more information, see CLI for Connections.
If needed, you can route any Hadoop jobs sourced from Hive to a specific YARN job queue.
Steps:
In your Hive connection, add the following option to your Connection String Options (connectStrOpts
) in your Hive connection:
"connectStrOpts": ";sess_var_list?mapred.job.queue.name=<your_queue>", |
where:<your_queue>
is the name of the YARN job queue.