Page tree

 

Contents:


This section describes the basics of creating a connection to Hive from Trifacta® Wrangler Enterprise and modifying it based on your cluster environment.

For more information on supported versions of Hive, see Configure for Hive.

Limitations

NOTE: The platform supports a single, global connection to Hive. All users must use this connection.

For more information on how the platform works with Hive, see Using Hive.

Create Encryption Key File

Before you create your Hive connection, you must create and reference an encryption key file. If you have not created one already, see Create Encryption Key File.

Create Hive Connection

You can create the connection through the application or the APIs.

NOTE: The following configuration methods apply to creation of an insecure connection to a Hive 1.x instance. If you are applying security, using HTTP, or connecting to Hive 2.x, additional configuration is required. Before creating these connections, please review the Additional Configuration Options section below.

Create through application

Trifacta administrator can create the Hive connection through the application.

Steps:

  1. Login to the application.
  2. In the menu, select Settings menu > Settings > Connections.
  3. In the Create Connection page, click the Hive connection card.
  4. Specify the properties for your Hive database. 

    1. Before you create the connection, you should review the rest of this section to verify that you are using the proper connection string options for the Hive connection to work in your environment. 

      NOTE: If secure impersonation is enabled on your cluster, you must include the Hive principal as part of your connection string. For more information, see Configure for Hive.


    2. For more information, see Create Connection Window.

  5. Click Save

Create via API

For more information, see API Connections Create v4.

Additional Configuration Options

Configure for HTTP

By default, the Trifacta platform utilizes TCP to connect to Hive. As needed, the platform can be configured to interact with HiveServer2 over HTTP. 

NOTE: HTTP connections are enabled by inserting additional parameters as part of the connection string. In some environments, the JDBC drive may automatically insert these parameters, based on configuration in hive-site.xml. If these parameters are not inserted, the Trifacta platform does not need additional configuration. For more information, see https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-ConnectionURLWhenHiveServer2IsRunninginHTTPMode.

NOTE: If you are planning to use SSL, additional configuration is required. See below.

 

Steps:

To enable the Trifacta platform to use HTTP when connecting to Hive, please do the following.

  1. Create a params file for your Hive connection. This file must contain at least the following entry:

        "connectStrOpts": ";transportMode=http;httpPath=cliservice"
  2. Save the file. 
  3. In your command, set the connection port value to 10001.
  4. Execute the command.

  5. Test the Hive connection.

  6. If it's not working, delete the connection and try again.

Configure for SASL

NOTE: Cloudera supports an additional method for communicating over SSL with Hive. For more information on how to identify the method used by your Cloudera cluster, see Configure for Cloudera.

The steps below describe how to enable the SASL-QOP method of SASL (Simple Authentication and Security Layer) communications. To enable, you must add an additional connection string parameter.

Steps:

  1. Create or edit the params file for your connection to Hive. This file must include the following setting for connectStrOpts:

    {
      "connectStrOpts": ";sasl.qop=auth-conf",
      "defaultDatabase": "default",
      "jdbc": "hive2"
    }
  2. Save the file.
  3. In your command, set the connection port value to 10001.
  4. Execute the command.
  5. Test the Hive connection.

  6. If it's not working, delete the connection and try again.

Configure for Hive over Zookeeper Quorum

NOTE: Hive 2.x connections using Zookeeper Quorum are supported on HDP 2.6 only.


Steps:

  1. Login to Ambari.
  2. In the left nav bar, select Hive
  3. Copy the HiveServer2 JDBC URL to the clipboard.
  4. Paste it into a text editor. It should look something like the following:

    jdbc:hive2://hdp26-w-1.c.example:2181,hdp26-w-3.c.example:2181,hdp26-w-2.c.example:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
  5. When you create the connection in  Trifacta Wrangler Enterprise:
    1. Host: Specify the first host value in the URL. In the above, it is this value:

      hdp26-w-1.c.example:2181,hdp26-w-3.c.example:2181,hdp26-w-2.c.example
    2. Port Number: In the above, it is this value:

      2181
    3. Connect String Options: In the above, it is this value:

      ;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
  6. Specify the other properties accordingly. 

Configure Hive connection for high availability 

Minimum requirements:

NOTE: High availability for Hive is supported for HDP 2.6 or HDP 3.0 using Hive 2.x only. Other Hive integrations are not supported.


  • Multiple Hive servers and metastores are required.
  • A minimum of three Zookeeper server is required.

Steps:

  1. Login to Ambari.
  2. In the left nav bar, select Hive
  3. Copy the HiveServer2 JDBC URL to the clipboard.
  4. Paste it into a text editor. Example:

    jdbc:hive2://thdp-m-1-20190226193250.c.t-dev:2181,thdp-w-2-20190226193418.c.t-dev:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
  5. When you create the connection in  Trifacta Wrangler Enterprise:
    1. Host: Specify the hosts listed in the JDBC URL, removing the protocol and port identifiers. From the above example, the value is the following:

      thdp-m-1-20190226193250.c.t-dev:2181,thdp-w-2-20190226193418.c.t-dev
    2. Port Number: From the example, it is this value:

      2181
    3. Connect String Options: From the example, you must prepend the remainder with the identifier of the principal, as in the following:

      ;principal=hive/_HOST@HORTONWORKS;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
  6. Specify the other properties accordingly. 
  7. To test:
    1. Run a job sourced from the first Hive server.
    2. If possible, disable the first listed Hive server to test failover access.

Configure for Kerberos

For a Kerberos-enabled cluster, you must include the Kerberos principal value as part of the Hive connection string options.

Please complete the following steps to update the Kerberos principal value for Hive into your Hive connection.

Steps:

  1. Retrieve the Kerberos principal value for your deployment.

  2. Apply the Kerberos principal to your Hive connection. Add the Hive principal in the above format to the Connect String Options in your Hive connection. See Create Connection Window.

  3. Save the file.
  4. Test the connection.

Run Hive jobs on a specific YARN queue

If needed, you can route any Spark jobs sourced from Hive to a specific YARN job queue. 

Steps:

  1. In your Hive connection, add the following option to your Connection String Options (connectStrOpts) in your Hive connection:

      "connectStrOpts": ";sess_var_list?mapred.job.queue.name=<your_queue>",
    
    

    where:
    <your_queue> is the name of the YARN job queue.

  2. Save the change.
  3. Test the connection. 
  4. Run a job, and verify that it is routed to the appropriate YARN job queue.

 

This page has no comments.