This section describes how you interact through the with your Hive data warehouse.
The can use Hive for the following tasks:
Read Access: Your Hadoop administrator must configure read permissions to Hive databases.
Your Hadoop administrator should provide a database table or tables for data upload to your Hive datastore.
Write Access: You can write jobs directly to Hive or ad-hoc publish jobs results to Hive at a later time. See Writing to Hive below.
Depending on the security features you've enabled, the technical methods by which access Hive may vary. For more information, see Configure Hadoop Authentication.
The can read in partitioned tables. However, it cannot read individual partitions of partitioned tables.
Tip: If you are reading data from a partitioned table, one of your early recipe steps in the Transformer page should filter out the unneeded table data so that you are reading only the records of the individual partition. |
Your Hadoop administrator should provide datasets or locations and access for storing datasets within Hive.
NOTE: The |
You can create a from a table or view stored in Hive. For more information, see Hive Browser.
For more information on how data types are imporetd from Hive, see Hive Data Type Conversions.
If you have enabled custom SQL and are reading data from a Hive view, nested functions are written to a temporary filename, unless they are explicitly aliased.
Tip: If your custom SQL uses nested functions, you should create an explicit alias from the results. Otherwise, the job is likely to fail. |
Problematic Example:
SELECT UPPER(`t1`.`colum1`), TRIM(`t1`.`column2`),... |
When these are ready from a Hive view, the temporary column names are: _c0
, _c1
, etc. During job execution, Spark ignores the column1
and column2
reference.
Improved Example:
SELECT UPPER(`t1`.`column1`) as col1, TRIM(`t1`.`column2`) as col2,... |
In this improved example, the two Hive view columns are aliased to the explicit column names, which are correctly interpreted and used by the Spark running environment during job execution.
You can write data back to Hive using one of the following methods:
NOTE: You cannot publish to a Hive database that is empty. The database must contain at least one table. |