The following are the basic AWS deployment scenarios.
Deployment Scenario |
| Base Storage Layer | Storage - S3 | Storage - Redshift | Cluster | Notes |
---|---|---|---|---|---|---|
| On-premises | HDFS | read only | Not supported | Hadoop | When HDFS is the base storage layer, the only accessible AWS resources is read-only access to S3. |
| EC2 | HDFS | read only | Not supported | Hadoop | When HDFS is the base storage layer, the only accessible AWS resources is read-only access to S3. |
| EC2 | S3 | read/write | read/write | Hadoop or EMR | |
| EC2 | S3 | read/write | read/write | None |
|
| EC2 | S3 | read/write | read/write | EMR | This deployment scenario integrates by default with an EMR cluster. It does not support integration with a Hadoop cluster. |
Microsoft Azure | Integration with AWS-based resources is not supported. See Install from Azure Marketplace. |
Legend and Notes:
Column | Notes | ||
---|---|---|---|
Deployment Scenario | Description of the AWS-connected deployment | ||
| Location where the All AWS installations are installed on EC2 instances. | ||
Base Storage Layer | When the
| ||
Storage - S3 |
For read/write access to S3, the base storage layer must be set to S3. | ||
Storage - Redshift | For access to Redshift, the base storage layer must be set to S3. | ||
Cluster | List of cluster types that are supported for integration and job execution at scale.
| ||
Notes | Any additional notes |
When the is installed on AWS, it is deployed on an EC2 instance. Through the EC2 console, there are a few key parameters that must be specified.
NOTE: After you have created the instance, you should retain the instanceId from the console, which must be applied to the configuration in the |
For more information, see Install from Amazon Marketplace.
Through the Amazon Marketplace, you can license and deploy an AMI of , a self-contained version of
that does not require integration with a clustered running environment. All job execution happens within the AMI on the EC2 instance that you deploy.
You can deploy an AMI of the onto an EC2 instance and then integrate it with your pre-configured EMR cluster for Spark-based job execution.
The following table describes the different AWS components that can host or integrate with the . Combinations of one or more of these items constitute one of the deployment scenarios listed in the following section.
AWS Service | Description | Base Storage Layer | Other Required AWS Services | |
---|---|---|---|---|
EC2 | Amazon Elastic Compute Cloud (EC2) can be used to host the
| Base storage layer can be S3 or HDFS. If set to HDFS, only read access to S3 is permitted. | ||
S3 | Amazon Simple Storage Service (S3) can be used for reading data sources, writing job results, and hosting the | Base storage layer can be S3 or HDFS. If set to HDFS, only read access to S3 is permitted. | ||
Redshift | Amazon Redshift provides a scalable data warehouse platform, designed for big data analytics applications. The | Base Storage Layer = S3 | S3 | |
AMI | Through the Amazon Marketplace, you can license and install an Amazon Machine Image (AMI) instance of | Base Storage Layer = S3
| EC2 instance | |
EMR | Through the Amazon Marketplace, you can license and install an AMI specifically configured to work with Amazon Elastic Map Reduce (EMR), a Hadoop-based data processing platform. | Base Storage Layer = S3 | EC2 instance, AMI | |
Amazon RDS | Optionally, the | Base Storage Layer = S3 |