This section describes how to enable the to read sources in Hive and write results back to Hive.
Configure for Hive
The user with which Hive connects to read from HDFS should be a member of the user group or whatever group is used to access HDFS from the .
Verify that the Unix or LDAP group has read access to the Hive warehouse directory.
Hive user for Spark:
Enable Data Service
In platform configuration, the must be enabled.
Please verify the following:
Locate the Hive JDBC Jar
In platform configuration, you must verify that the following parameter is pointing to the proper location for the Hive JDBC JAR file. The example below identfies the location for Cloudera 5.10:
Enable Hive Support for Spark Job Service
If you are using the Spark running environment for execution and profiling jobs, you must enable Hive support within the Spark Job Service configuration block.
Enable Hive Database Access for Spark Job Service
The Spark Job Services requires read access to the Hive databases. Please verify that the Spark user can access the required Hive databases and tables.
For more information, please contact your Hive administrator.
Configure managed table format
The publishes to Hive using managed tables. When writing to Hive, the platform pushes to an externally staged table. Then, from this staging table, the platform selects and inserts into a managed table.
By default, the platform published to managed tables in Parquet format. As needed, you can apply the following values into platform configuration to change the format to which the platform writes when publishing a managed table:
To change the format, please modify the following parameter.
Create Hive Connection
For more information, see Create Hive Connections.
Depending on your Hadoop environment, you may need to perform additional configuration to enable connectivity with your Hadoop cluster.
Additional Configuration for Secure Environments
NOTE: You should have already configured the to use secure impersonation. For more information on basic configuration, see Configure for secure impersonation.
You must add the Hive principal value to your Hive connection:
connectStrOptsentry for your params file:
NOTE: You should have already enabled basic Kerberos integration. For more information, see Set up for a Kerberos-enabled Hadoop cluster.
NOTE: If you are enabling Hive in a Kerberized environment, you must also enable secure impersonation. When connecting to Hive, Kerberos without secure impersonation is not supported. You should have already configured the to use secure impersonation. For more information on basic configuration, see Configure for secure impersonation.
The can be configured to use Sentry to authorize access to Hive. See Configure for Hive with Sentry.
NOTE: The platform cannot publish to a default database in Hive that is empty. Please create at least one table in your default database.
Store the dataset in the following example directory:
Use the following command to create your table:
create table test (name string, id bigint, id2 bigint, randomName string, description string, dob string, title string, corp string, fixedOne bigint, fixedTwo int) row format delimited fields terminated by ',' STORED AS TEXTFILE;
Add the example dataset to the above test table:
load data local inpath '/tmp/hiveTest_5mb' into table test;