The  can be hosted within Amazon and supports integrations with multiple services from Amazon Web Services, including combinations of services for hybrid deployments. This section provides an overview of the integration options, as well as links to related configuration topics.

For an overview of AWS deployment scenarios, see Supported Deployment Scenarios for AWS.

Internet Access

From AWS, the  requires Internet access for the following services:

NOTE: Depending on your AWS deployment, some of these services may not be required.

 

  • AWS S3
  • Key Management System [KMS] (if sse-kms server side encryption is enabled)
  • Secure Token Service [STS] (if temporary credential provider is used)
  • EMR (if integration with EMR cluster is enabled)

NOTE: If the is hosted in a VPC where Internet access is restricted, access to S3, KMS and STS services must be provided by creating a VPC endpoint. If the platform is accessing an EMR cluster, a proxy server can be configured to provide access to the AWS ElasticMapReduce regional endpoint.


Database Installation

The following database scenarios are supported.

Database HostDescription
Cluster node

By default, the are installed on PostgreSQL instances in the or another accessible node in the enterprise environment. For more information, see Install Databases.

Amazon RDS

For Amazon-based installations, you can install the on PostgreSQL instances on Amazon RDS. For more information, see Install Databases on Amazon RDS.

Base AWS Configuration

The following configuration topics apply to AWS in general.

Base Storage Layer

NOTE: The base storage layer must be set during initial configuration and cannot be modified after it is set.

S3: Most of these integrations require use of S3 as the base storage layer, which means that data uploads, default location of writing results, and sample generation all occur on S3. When base storage layer is set to S3, the  can:

HDFS: In on-premises installations, it is possible to use S3 as a read-only option for a Hadoop-based cluster when the base storage layer is HDFS. You can configure the platform to read from and write to S3 buckets during job execution and sampling. For more information, see Enable S3 Access.

For more information on setting the base storage layer, see Set Base Storage Layer.

For more information, see Storage Deployment Options.

Configure AWS Region

For Amazon integrations, you can configure the  to connect to Amazon datastores located in different regions. 

NOTE: This configuration is required under any of the following deployment conditions:

  1. The is installed on-premises, and you are integrating with Amazon resources.
  2. The EC2 instance hosting the is located in a different AWS region than your Amazon datastores.
  3. The or the EC2 instance does not have access to s3.amazonaws.com.

 

  1. In the AWS console, please identify the location of your datastores in other regions. For more information, see the Amazon documentation.
  2. Login to the .
  3. Set the value of the following property to the region where your S3 datastores are located:

    aws.s3.region

    If the above value is not set, then the  attempts to infer the region based on default S3 bucket location.

  4. Save your changes.

AWS Authentication

The following table illustrates the various methods of managing authentication between the platform and AWS. The matrix of options is basically determined by the settings for two key parameters.

AWS Mode

SystemUser
Credential Provider

 

 
Default One system-wide key/secret combo is inserted in the platform for useEach user provides key/secret combo. 
 

Config:

"aws.credentialProvider": "default",
"aws.mode": "system",
"aws.s3.key": <key>,
"aws.s3.secret": <secret>, 


Config:

"aws.credentialProvider": "default",
"aws.mode": "user", 

User: Configure Your Access to S3

Instance Platform uses EC2 instance roles.Users provide EC2 instance roles. 
 

Config:

"aws.credentialProvider": "instance",
"aws.mode": "system",


 Config:

"aws.credentialProvider": "instance",
"aws.mode": "user",


Temporary

Temporary credentials are issued based on per-user IAM roles.

Per-user authentication when using IAM role.
 

Config:

"aws.credentialProvider": "temporary",
"aws.mode": "system",
"aws.systemIAMRole": "<IAMRole">,


Config:

"aws.credentialProvider": "temporary",
"aws.mode": "user",


 

AWS Auth Mode

When connecting to AWS, the platform supports the following basic authentication modes:

 

ModeConfigurationDescription
system


"aws.mode": "system",


Access to AWS resources is managed through a single, system account. The account that you specify is based on the credential provider selected below.

  • The instance credential provider ignores this setting.

See below.

user


"aws.mode": "user",


Authentication must be specified for individual users.

NOTE: Creation and use of custom dictionaries is not supported in user mode.


Tip: In AWS user mode, can manage S3 access for users through the Admin Settings page. See Manage Users.


AWS Credential Provider

The  supports the following methods of providing credentialed access to AWS and S3 resources.

TypeConfigurationDescription
default 


"aws.credentialProvider":"default",


This method uses the provided AWS Key and Secret values to access resources. See below.
instance 


"aws.credentialProvider":"instance",


When you are running the  on an EC2 instance, you can leverage your enterprise IAM roles to manage permissions on the instance for the . See below. 

temporaryDetails are below. 

Default credential provider

Whether the AWS access mode is set to system or user, the default credential provider for AWS and S3 resources is the 

ModeDescriptionConfiguration


"aws.mode": "system",


A single AWS Key and Secret is inserted into platform configuration. This account is used to access all resources and must have the appropriate permissions to do so.

 


"aws.s3.key": "<your_key_value>",
"aws.s3.secret": "<your_key_value>",



"aws.mode": "user",


Each user must specify an AWS Key and Secret into the account to access resources.For more information on configuring individual user accounts, see Configure Your Access to S3.

Default credential provider with EMR:

If you are using this method and integrating with an EMR cluster: 

Instance credential provider

When the platform is running on an EC2 instance, you can manage permissions through pre-defined IAM roles. 

NOTE: If the is connected to an EMR cluster, you can force authentication to the EMR cluster to use the specified IAM instance role. See Configure for EMR.

For more information, see  Configure for EC2 Role-Based Authentication.

Temporary credential provider

For even better security, you can enable use temporary credentials provided from your AWS resources based on an IAM role specified per user.  

Tip: This method is recommended by AWS.

Set the following properties.

PropertyDescription


"aws.credentialProvider"


  • If aws.mode = system, set this value to temporary.
  • If aws.mode = user and you are using per-user authentication, then this setting is ignored and should stay as default.

Per-user authentication

Individual users can be configured to provide temporary credentials for access to AWS resources, which is a more secure authentication solution. For more information, see Configure AWS Per-User Authentication.

AWS Storage

S3 Sources

To integrate with S3, additional configuration is required. See Enable S3 Access.

Redshift Connections

You can create connections to one or more Redshift databases, from which you can read database sources and to which you can write job results. Samples are still generated on S3.

NOTE: Relational connections require installation of an encryption key file on the . For more information, see Create Encryption Key File.

For more information, see Create Redshift Connections.

AWS Clusters

 can integrate with one instance of either of the following. 

NOTE: If is installed through the Amazon Marketplace, only the EMR integration is supported.


EMR

When  in installed through AWS, you can integrate with an EMR cluster for Spark-based job execution. For more information, see Configure for EMR.

Hadoop

If you have installed  on-premises or directly into an EC2 instance, you can integrate with a Hadoop cluster for Spark-based job execution. See Configure for Hadoop.