You can create a single, global connection to your default S3 bucket through the |
Simple Storage Service (S3) is an online data storage service provided by Amazon, which provides low-latency access through web services. For more information, see https://aws.amazon.com/s3/.
NOTE: A single, global connection to S3 is supported for workspace mode only. In per-user mode, individual users must configure their own access to S3. |
Tip: After you have specified a default Amazon S3 connection, you can connect to additional S3 buckets through a different connection type. For more information, see External S3 Connections. |
Before you begin, please verify that your meets the following requirements:
Integration: Your workspace is connected to a running environment supported by your product edition.
Verify that Enable S3 Connectivity
has been enabled in the Workspace Settings Page.
Before you specify this connection, you should acquire the following information. For more information on the permissions required by , see Required AWS Account Permissions.
Tip: Credentials may be available through your S3 administrator. |
You must choose one of the following authentication methods and acquire the listed information below.
IAM role: Use a cross-account (IAM) role to define the AWS resources, including S3, to which the has access. For more information, see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html.
Tip: When you choose to create this connection type, instructions are provided in the connection window for how to create and apply the IAM policies and roles for the connection. |
Publishing the output to multi-part files is not supported.
NOTE: For some file formats, like Parquet, multi-part files are the default output. |
Publishing the output using compression option is not supported for jobs.
Workaround: If you need to generate an output using compression to this S3 bucket, you can run the job on another running environment. |
You can create this S3 connection through the application.
NOTE: You can create a single, global connection of this type. This connection is available to all workspace users. |
Steps:
In the Create Connection page, click the Amazon S3 card.
When the connection is first accessed for browsing, the contents of this bucket are displayed. If this value is not provided, then the list of available buckets based on the key/secret combination is displayed when browsing through the connection.
NOTE: To see the list of available buckets, the connecting user must have the getBucketList permission. If that permission is not present and no default bucket is listed, then the user cannot browse S3. |
Additional S3 buckets: If these credentials enable access to additional S3 buckets, you can specify them as a comma-separated list of bucket names:
myBucket1,myBucket2,myBucket3 |
Encryption type: If server-side encryption has been enabled on your bucket, you can select the server-side encryption policy to use when writing to the bucket. SSE-S3 and SSE-KMS methods are supported. For more information, see http://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html.
Server Side Kms key Id: When KMS encryption is enabled, you must specify the AWS KMS key ID to use for the server-side encryption. For more information, see "Server Side KMS Key Identifier" below.
Click Save.
NOTE: After you have created this connection, it does not appear in the Connections page. To modify this connection, select User menu > Admin console > AWS Settings. See AWS Settings Page. |
When KMS encryption is enabled, you must specify the AWS KMS key ID to use for the server-side encryption.
The AWS IAM role must be assigned to this key.
The AWS IAM role must be given these permissions.
The format for referencing this key is the following:
"arn:aws:kms:<regionId>:<acctId>:key/<keyId>" |
You can use an AWS alias in the following formats. The format of the AWS-managed alias is the following:
"alias/aws/s3" |
The format for a custom alias is the following:
"alias/<FSR>" |
where:
<FSR>
is the name of the alias for the entire key.
For more information, see
operation/createConnection |
The Java VFS Service has been modified to handle an optional connection ID, enabling S3 URLs with connection ID and credentials. The other connection details are fetched through the to create the required URL and configuration.
// sample URI s3://bucket-name/path/to/object?connectionId=136 // sample java-vfs-service CURL request with s3 curl -H 'x-trifacta-person-workspace-id: 1' -X GET 'http://localhost:41917/vfsList?uri=s3://bucket-name/path/to/object?connectionId=136' |
For more information, see Verify Operations.
can use S3 for the following tasks:
Writing Results: After a job has been executed, you can write the results back to S3.
In the , S3 is accessed through the S3 browser. See S3 Browser.
NOTE: When |
Avoid using |
Your administrator should provide a writeable home output directory for you. This directory location is available through your user profile. See Storage Config Page.
Your administrator can grant access on a per-user basis or for the entire workspace.
utilizes an S3 key and secret to access your S3 instance. These keys must enable read/write access to the appropriate directories in the S3 instance.
NOTE: If you disable or revoke your S3 access key, you must update the S3 keys for each user or for the entire system. |
Your administrator should provide raw data or locations and access for storing raw data within S3. All should have a clear understanding of the folder structure within S3 where each individual can read from and write results.
NOTE: |
You can create an imported dataset from one or more files stored in S3.
NOTE: Import of glaciered objects is not supported. |
Wildcards:
You can parameterize your input paths to import source files as part of the same imported dataset. For more information, see Overview of Parameterization.
Folder selection:
When you select a folder in S3 to create your dataset, you select all files in the folder to be included.
Notes:
When a folder is selected from S3, the following file types are ignored:
*_SUCCESS
and *_FAILED
files, which may be present if the folder has been populated by the running environment.NOTE: If you have a folder and file with the same name in S3, search only retrieves the file. You can still navigate to locate the folder. |
When creating a dataset, you can choose to read data in from a source stored from S3 or local file.
/trifacta/uploads
where they remain and are not changed.Data may be individual files or all of the files in a folder. In the Import Data page, click the S3 tab. See Import Data Page.
When you run a job, you can specify the S3 bucket and file path where the generated results are written. By default, the output is generated in your default bucket and default output home directory.