Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next


  1. Review Planning Guide: Please review and verify Install Preparation and sub-topics.
    1. Limitations: For more information on limitations of this scenario, see Product Limitations in the Install Preparation area.
  2. Read: Please read this entire document before you create the EMR cluster or install the 

    D s platform

  3. Acquire Assets: Acquire the installation package for your operating system and your license key. For more information, contact 
    D s support
    1. If you are completing the installation without Internet access, you must also acquire the offline versions of the system dependencies. See Install Dependencies without Internet Access.
  4. VPC: Enable and deploy a working AWS VPC.
  5. S3: Enable and deploy an AWS S3 bucket to use as the base storage layer for the platform. In the bucket, the platform stores metadata in the following location:

    Code Block


  6. IAM Policies: Create IAM policies for access to the S3 bucket. Required permissions are the following: 
    • The system account or individual user accounts must have full permissions for the S3 bucket:

      Code Block
      Delete*, Get*, List*, Put*, Replicate*, Restore*
    • These policies must apply to the bucket and its contents. Example:

      Code Block
    • See
  7. EC2 instance: Deploy an AWS EC2 with SELinux where the 
    D s item
     can be installed.
    1. The required set of ports must be enabled for listening. See System Ports.

    2. This node should be dedicated for 

      D s item


      NOTE: The EC2 node must meet the system requirements. For more information, see System Requirements.

  8. EC2 instance role: Create an EC2 instance role for your S3 bucket policy. See
  9. EMR cluster: An existing EMR cluster is required. 
    1. Cluster sizing: Before you begin, you should allocate sufficient resources for sizing the cluster. For guidance, please contact your 

      D s item

    2. See Deploy the Cluster below.
  10. Databases:
    1. The platform utilizes a set of databases that must be accessed from the 
      . Databases are installed as part of the workflow described later.
    2. For more information on the supported databases and versions, see System Requirements.
    3. For more information on database installation requirements, see Install Databases.
    4. If installing databases on Amazon RDS an admin account to RDS is required. For more information, see Install Databases on Amazon RDS.


  • EMR:
    • AWS region for the EMR cluster, if it exists.
    • ID for EMR cluster, if it exists
      • If you are creating an EMR cluster as part of this process, please retain the ID.
      • The EMR cluster must allow access from the
        . This configuration is described later.
  • Subnet: Subnet within your virtual private cloud (VPC) where you want to launch the
    D s platform
    • This subnet should be in the same VPC as the EMR cluster.
    • Subnet can be private or public.
    • If it is private and it cannot access the Internet, additional configuration is required. See below.
  • S3:
    • Name of the S3 bucket that the platform can use
    • Path to resources on the S3 bucket

  • EC2: 
    • Instance type for the 

Internet access

D excerpt include
pageConfigure for AWS



NOTE: You can try to verify operations using the

running environment at this time. While you can also try to run a job on the Hadoop clusterin the Spark running environment, additional configuration may be required to complete the integration. These steps are listed under Next Steps below.


Required Platform Configuration

This section covers the following topics, some of which should already be completed:

  • Set Base Storage Layer - The base storage layer must be set once and never changed. Set this value to s3.

  • Create Encryption Key File - If you plan to integrate the platform with any relational sources, including Redshift, you must create an encryption key file and store it on the
  • Running Environment Options - Depending on your scenario, you may need to perform additional configuration for your available running environment(s) for executing jobs.
  • Profiling Options - In some environments, tweaks to the settings for visual profiling may be required. You can disable visual profiling if needed.
  • Configure for Spark - If you are enabling the Spark running environment, please review and verify the configuration for integrating the platform with the Hadoop cluster instance of Spark running environment.

Configure for EMR

Set up for a new EMR cluster. Some content may apply to existing EMR clusters.

Enable Integration with Compressed ClustersIf the Hadoop cluster uses compression, additional configuration is required.
Enable Integration with Cluster High Availability

If you are integrating with high availability on the Hadoop cluster, please complete these steps.

  • If you are integrating with high availability on the Hadoop cluster, HttpFS must be enabled in the platform. HttpFS is required in other, less-common cases. See Enable HttpFS.
Enable Relational Connections

Enable integration with relational databases, including Redshift.

Configure for KMSIntegration with the Hadoop cluster's key management system (KMS) for encrypted transport. Instructions are provided for distribution-specific versions of Hadoop.
Configure Security

A list of topics on applying additional security measures to the

D s platform
and how integrates with Hadoop.

Configure SSO for AD-LDAPPlease complete these steps if you are integrating with your enterprise's AD/LDAP Single Sign-On (SSO) system.