To install inside your enterprise infrastructure, please review and complete the following sections in the order listed below.
Deploy Hadoop cluster: In this scenario, the does not create a Hadoop cluster. See below.
NOTE: Installation and maintenance of a working Hadoop cluster is the responsibility of the customer. Guidance is provided below on the requirements for integrating the platform with the cluster.
Limitations: For more information on limitations of this scenario, see Product Limitations in the Install Preparation area.
In your enterprise infrastructure, you must deploy a cluster using a supported version of Hadoop to manage the expected data volumes of your . For more information on suggested sizing, see Sizing Guidelines in the Install Preparation area.
When you configure the platform to integrate with the cluster, you must acquire information about the cluster configuration. For more information on the set of information to collect, see Pre-Install Checklist in the Install Preparation area.
NOTE: By default, smaller jobs are executed on the running environment . Larger jobs are executed using Spark on the integrated Hadoop cluster. Spark must be installed on the cluster. For more information, see System Requirements in the Install Preparation area.
The supports integration with the following cluster types. For more information on the supported versions, please see the listed sections below.
Additional users may be required. For more information, see Required Users and Groups in the Install Preparation area.
An edge node of the cluster is required to host the software. For more information on the requirements of this node, see System Requirements.
Please complete these steps listed in order:
The platform requires that one backend datastore be configured as the base storage layer. This base storage layer is used for storing uploaded data and writing results and profiles.
NOTE: By default, the base storage layer for is set to HDFS. You can change it now, if needed. After this base storage layer is defined, it cannot be changed again.
See Set Base Storage Layer.
NOTE: You can try to verify operations using the running environment at this time. While you can also try to run a job on the Hadoop cluster, additional configuration may be required to complete the integration. These steps are listed under Next Steps below.
Tip: You should access online documentation through the product. Online content may receive updates that are not present in PDF content.
You can access complete product documentation online and in PDF format. From within the , select Help menu > Product Docs.
After you have accessed the documentation, the following topics are relevant to on-premises deployments. Please review them in order.
NOTE: These materials are located in the Configuration Guide.
|Required Platform Configuration|
This section covers the following topics, some of which should already be completed:
|Configure for Hadoop|
|Enable Integration with Compressed Clusters||If the Hadoop cluster uses compression, additional configuration is required.|
|Enable Integration with Cluster High Availability|
If you are integrating with high availability on the Hadoop cluster, please complete these steps.
|Configure for Hive|
Integration with the Hadoop cluster's instance of Hive.
|Configure for KMS||Integration with the Hadoop cluster's key management system (KMS) for encrypted transport. Instructions are provided for distribution-specific versions of Hadoop.|
A list of topics on applying additional security measures to the and how integrates with Hadoop.
|Configure SSO for AD-LDAP||Please complete these steps if you are integrating with your enterprise's AD/LDAP Single Sign-On (SSO) system.|