Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

  • If base storage layer is S3: you  you can enable read/write access to S3.
  • If base storage layer is not S3: you  you can enable read only access to S3.

Limitations

...

  • The 

    D s platform
     supports S3 connectivity for the following distributions: 

    Info

    NOTE: You must use a Hadoop version that is supported for your release of the product.

  • The The
    D s platform
     only supports running S3-enabled instances over AWS.
  • Access to AWS S3 Regional Endpoints through internet protocol is required. If the machine hosting the
    D s platform
     is in a VPC with no internet access, a VPC endpoint enabled for S3 services is required. The 
    D s platform
     does not support access to S3 through a proxy server.

...

Info

NOTE: Spark 2.3.0 jobs may fail on S3-based datasets due to a known incompatibility. For details, see https://github.com/apache/incubator-druid/issues/4456.

If you encounter this issue, please set spark.version to 2.1.0 in platform configuration. For more information, see Admin Settings Page.

Pre-requisites

  • If IAM instance role is used for S3 access, it must have access to resources at the bucket level.

...

Required AWS Account Permissions

All access to S3 sources occurs through a single AWS account (system mode) or through an individual user's account (user mode). For either mode, the AWS access key and secret combination must provide read and write access to the default bucket associated with the account. 

Info

NOTE: These permissions should be set up by your AWS administrator

Read-only access polices

Info

NOTE: To enable viewing and browsing of all folders within a bucket, the following permissions are required:

  • The system account or individual user accounts must have the ListAllMyBuckets access permission for the bucket.
  • All objects to be browsed within the bucket must have Get access enabled.

Configuration

Depending on your S3 environment, you can define:

  • S3 as base storage layer
  • read access to S3
  • S3 bucket that is the default write destination
  • access to additional S3 buckets

Define base storage layer

The base storage layer is the default platform for storing results. To enable write access to S3, you must define it as the base storage layer for your The policy statement to enable read-only access to your default S3 bucket should look similar to the following. Replace 3c-my-s3-bucket with the name of your bucket:

Code Block
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::3c-my-s3-bucket",
                "arn:aws:s3:::3c-my-s3-bucket/*",
            ]
        }
    ]
}


Write access polices

Write access is enabled by adding the PutObject and DeleteObject actions to the above. Replace 3c-my-s3-bucket with the name of your bucket:

Code Block
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::3c-my-s3-bucket",
                "arn:aws:s3:::3c-my-s3-bucket/*",
            ]
        }
    ]
}

Other AWS policies for S3

Policy for access to 
D s item
itempublic buckets

Info

NOTE: This feature must be enabled. For more information, see Enable Onboarding Tour.

To access S3 assets that are managed by 

D s company
, you must apply the following policy definition to any IAM role that is used to access 
D s product
productssp
. This bucket contain demo assets for the On-Boarding tour:

Code Block
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::trifacta-public-datasets/*",
                "arn:aws:s3:::trifacta-public-datasets"
            ]
        }
    ]
}

For more information on creating policies, see https://console.aws.amazon.com/iam/home#/policies.

KMS policy

If any accessible bucket is encrypted with KMS-SSE, another policy must be deployed. For more information, see https://docs.aws.amazon.com/kms/latest/developerguide/iam-policies.html.

Configuration

Depending on your S3 environment, you can define:

  • read access to S3
  • access to additional S3 buckets

  • S3 as base storage layer
  • Write access to S3
    • S3 bucket that is the default write destination

Define base storage layer

The base storage layer is the default platform for storing results.

Required for:

  • Write access to S3
  • Connectivity to Redshift
Warning

The base storage layer for your

D s item
iteminstance
is defined during initial installation and cannot be changed afterward.

If S3 is the base storage layer, you must also define the default storage bucket to use during initial installation, which cannot be changed at a later time. See Define default S3 write bucket below.

For more information on the various options for storage, see Storage Deployment Options.

For more information on setting the base storage layer, see Set Base Storage Layer.

Enable read access to S3

...

When read access is enabled,

D s item
itemdeployment
.

...

users
 can explore S3 buckets for creating datasets. 

is defined during initial installation and cannot be changed afterward.

If S3 is the base storage layer, you must also define the default storage bucket to use during initial installation, which cannot be changed at a later time. See Define default S3 write bucket below.

Info

NOTE: When read access is enabled,

d-s-
item
item
instance

For more information on the various options for storage, see Storage Deployment Options.

For more information on setting the base storage layer, see Set Base Storage Layer.

Enable read access to S3

...

itemusers
have automatic access to all buckets to which the specified S3 user has access. You may want to create a specific user account for S3 access.


Info

NOTE: Data that is mirrored from one S3 bucket to another might inherit the permissions from the bucket where it is owned.

Steps:

  1. D s config
  2. Set the following property to true:

    Code Block
    "aws.s3.enabled": true,
  3. Save your changes.
  4. In the S3 configuration section, set enabled=true, which allows
    D s item
    itemusers

...

  1.  to browse S3 buckets through the
    D s webapp
    .
  2. Specify the AWS key and secret values for the user to access S3 storage.

S3 access modes

The 

d-s-

...

itemusers

...

platform
 supports the following modes for access S3. You must choose one access mode and then complete the related configuration steps.

Info

NOTE: Data that is mirrored from one S3 bucket to another might inherit the permissions from the bucket where it is owned.

Steps:

...

D s item
itemusers

...

D s webapp

...

Avoid switching between user mode and system mode, which can disable user access to data. At install mode, you should choose your preferred mode.

 

System mode

(default) Access to S3 buckets is enabled and defined for all users of the platform. All users use the same AWS access key, secret, and default bucket.

System mode - read-only access

For read-only access, the key, secret, and default bucket must be specified in configuration.

Info

NOTE: Please verify that the AWS account has all required permissions to access the S3 buckets in use. The account must have the ListAllMyBuckets ACL among its permissions.

S3 access modes

The Steps:

  1. d-s-

...

  1. config

...

Info

NOTE: Avoid switching between user mode and system mode, which can disable user access to data. At install mode, you should choose your preferred mode.

 

System mode

(default) Access to S3 buckets is enabled and defined for all users of the platform. All users use the same AWS access key, secret, and default bucket.

To enable:

...


  1. Locate the following parameters:

    ParametersDescription
    aws.s3.keySet this value to the AWS key to use to access S3.
    aws.s3.secretSet this value to the secret corresponding to the AWS key provided.
    aws.s3.bucket.name

    Set this value to the name of the S3 bucket from which users may read data.

    Info

    NOTE: Additional buckets may be specified. See below.

  2. Save your changes.

User mode

Optionally, access to S3 can be defined on a per-user basis. This mode allows administrators to define access to specific buckets using various key/secret combinations as a means of controlling permissions.

Info

NOTE: When this mode is enabled, individual users must have AWS configuration settings applied to their account, either by an administrator or by themselves. The global settings in this section do not apply in this mode.

...

  1. D s config

  2. Please verify that the settings below have been configured:

    Code Block
    "aws.s3.enabled": true,
    "aws.mode": "user",
  3. Additional configuration is required for per-user authentication. For more information, see Configure AWS Per-User AuthenticationUser Authentication.
User mode - Create encryption key file

...

Info

NOTE: If you have enabled user access mode, you can skip the following sections, which pertain to the system access mode, and jump to the Enable Redshift Connection section below.

System

...

mode - additional configuration

The following sections apply only to system access mode.

...

When S3 is defined as the base storage layer, write access to S3 is enabled. The The 

D s platform
 attempts to store outputs in the designated default S3 bucket. 

...

  1. Define S3 to be the base storage layer. See Set Base Storage Layer.
  2. Enable read access. See Enable read access.
  3. Specify a value for  for  aws.s3.bucket.name , which  which defines the S3 bucket where data is written. Do not include a protocol identifier. For example, if your bucket address is is s3://MyOutputBucket, the value to specify is the following:

    Code Block
    MyOutputBucket
    Info

    NOTE: Specify the top-level bucket name only. There should not be any backslashes in your entry.

...

  1. D s config

  2. Locate the following parameter: aws.s3.extraBuckets:

    1. In the Admin Settings page, specify the extra buckets as a comma-separated string of additional S3 buckets that are available for storage. Do not put any quotes around the string. Whitespace between string values is ignored.

    2. In 

      D s triconf
      , specify the extraBuckets array as a comma-separated list of buckets as in the following: 

      Code Block
      "extraBuckets": ["MyExtraBucket01","MyExtraBucket02","MyExtraBucket03"]
      Info

      NOTE: Specify the top-level bucket name only. There should not be any backslashes in your entry.

  3. Specify the extraBuckets array as a comma-separated list of buckets as in the following: 

    Code Block
    "extraBuckets": ["MyExtraBucket01","MyExtraBucket02","MyExtraBucket03"]
  4. These values are mapped to the following bucket addresses:

    Code Block
    s3://MyExtraBucket01
    s3://MyExtraBucket02
    s3://MyExtraBucket03

S3 Configuration

...

Configuration reference

D s config

Code Block
"aws.s3.enabled": true,
"aws.s3.bucket.name": "<BUCKET_FOR_OUTPUT_IF_WRITING_TO_S3>"
"aws.s3.key": "<AWS_KEY>",
"aws.s3.secret": "<AWS_SECRET>",
"aws.s3.extraBuckets": ["<ADDITIONAL_BUCKETS_TO_SHOW_IN_FILE_BROWSER>"]

...

You can configure the 

D s platform
 to publish data on S3 when a server-side encryption policy is enabled. SSE SSE-S3 and SSE-KMS methods are supported. For more information, see http://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html.

Notes:

  • When encryption is enabled, all buckets to which you are writing must share the same encryption policy. Read operations are unaffected.
  • This feature This feature is supported for the following Hadoop distributions:
    • SSE-S3: CDH 5.10 or later, HDP 2.6 or later
    • SSE-KMS: CDH 5.11 or later, HDP 2.6.1 or later

...

Code Block
"aws.s3.serverSideKmsKeyId": "",

Notes:

  • The authenticating user must have access to this key, or the Access to the key:
    • Access must be provided to the authenticating user.
    • The AWS IAM role must be assigned to this key.

  • The authenticating user or the AWS IAM role must be given Encrypt/Decrypt permissions for the specified KMS key ID:
    • Permissions must be assigned to the authenticating user.
    • The AWS IAM role must be given these permissions.

    • For more information,
    see

The format for referencing this key is the following:

...

Code Block
"alias/<FSR>"

where:

<FSR> is  is the name of the alias for the entire key.

...