To install  inside your enterprise infrastructure, please review and complete the following sections in the order listed below.

Scenario Description

Preparation

  1. Review Planning Guide: Please review and verify Install Preparation and sub-topics.
  2. Acquire Assets: Acquire the installation package for your operating system and your license key. For more information, contact .
    1. If you are completing the installation without Internet access, you must also acquire the offline versions of the system dependencies. See Install Dependencies without Internet Access.
  3. Deploy Hadoop cluster: In this scenario, the  does not create a Hadoop cluster. See below.

    NOTE: Installation and maintenance of a working Hadoop cluster is the responsibility of the customer. Guidance is provided below on the requirements for integrating the platform with the cluster.

  4. Deploy : must be installed on an edge node of the cluster. Details are below.

Limitations: For more information on limitations of this scenario, see Product Limitations in the Install Preparation area.

Deploy the Cluster

In your enterprise infrastructure, you must deploy a cluster using a supported version of Hadoop to manage the expected data volumes of your . For more information on suggested sizing, see Sizing Guidelines in the Install Preparation area.

When you configure the platform to integrate with the cluster, you must acquire information about the cluster configuration. For more information on the set of information to collect, see Pre-Install Checklist in the Install Preparation area.

NOTE: By default, smaller jobs are executed on the running environment . Larger jobs are executed using Spark on the integrated Hadoop cluster. Spark must be installed on the cluster. For more information, see System Requirements in the Install Preparation area.

The  supports integration with the following cluster types. For more information on the supported versions, please see the listed sections below.

Prepare the cluster

Additional users may be required. For more information, see Required Users and Groups in the Install Preparation area.

Deploy the 

An edge node of the cluster is required to host the  software. For more information on the requirements of this node, see System Requirements

Install Workflow

Please complete these steps listed in order:

  1. Install software: Install the  software on the cluster edge node. See Install Software.

  2. Install databases: The platform requires several databases for storage.

    NOTE: The default configuration assumes that you are installing the databases on a PostgreSQL server on the same edge node as the software using the default ports. If you are changing the default configuration, additional configuration is required as part of this installation process.

    For more information, see Install Databases.

  3. Start the platform: For more information, see Start and Stop the Platform.
  4. Login to the application: After software and databases are installed, you can login to the application to complete configuration:
    1. See Login.
    2. As soon as you login, you should change the password on the admin account. In the left menu bar, select Settings > Admin Settings. Scroll down to Manage Users. For more information, see Change Admin Password.

Tip: At this point, you can access the online documentation through the application. In the left menu bar, select Help menu > Product Docs. All of the following content, plus updates, is available online. See Documentation below.

Configure for Hadoop

Configure for 

Set base storage layer

The platform requires that one backend datastore be configured as the base storage layer. This base storage layer is used for storing uploaded data and writing results and profiles. 

NOTE: By default, the base storage layer for is set to HDFS. You can change it now, if needed. After this base storage layer is defined, it cannot be changed again.

See Set Base Storage Layer.
 

Verify Operations

NOTE: You can try to verify operations using the running environment at this time. While you can also try to run a job on the Hadoop cluster, additional configuration may be required to complete the integration. These steps are listed under Next Steps below.

 

Documentation

Tip: You should access online documentation through the product. Online content may receive updates that are not present in PDF content.

You can access complete product documentation online and in PDF format. From within the , select Help menu > Product Docs.

Next Steps

After you have accessed the documentation, the following topics are relevant to on-premises deployments. Please review them in order.

NOTE: These materials are located in the Configuration Guide.

TopicDescription
Required Platform Configuration

This section covers the following topics, some of which should already be completed:

  • Set Base Storage Layer - The base storage layer must be set once and never changed.
  • Create Encryption Key File - If you plan to integrate the platform with any relational sources, including Hive or Redshift, you must create an encryption key file and store it on the
  • Running Environment Options - Depending on your scenario, you may need to perform additional configuration for your available running environment(s) for executing jobs.
  • Profiling Options - In some environments, tweaks to the settings for visual profiling may be required. You can disable visual profiling if needed.
  • Configure for Spark - If you are enabling the Spark running environment, please review and verify the configuration for integrating the platform with the Hadoop cluster instance of Spark.
Configure for Hadoop
Enable Integration with Compressed ClustersIf the Hadoop cluster uses compression, additional configuration is required.
Enable Integration with Cluster High Availability

If you are integrating with high availability on the Hadoop cluster, please complete these steps.

  • If you are integrating with high availability on the Hadoop cluster, HttpFS must be enabled in the platform. HttpFS is required in other, less-common cases. See Enable HttpFS.
Configure for Hive

Integration with the Hadoop cluster's instance of Hive.

 

Configure for KMSIntegration with the Hadoop cluster's key management system (KMS) for encrypted transport. Instructions are provided for distribution-specific versions of Hadoop.
Configure Security

A list of topics on applying additional security measures to the and how integrates with Hadoop.

Configure SSO for AD-LDAPPlease complete these steps if you are integrating with your enterprise's AD/LDAP Single Sign-On (SSO) system.