D toc |
---|
D s ed | ||
---|---|---|
|
Excerpt | ||||
---|---|---|---|---|
You can create connections to specific S3 buckets through the
|
Simple Storage Service (S3) is an online data storage service provided by Amazon, which provides low-latency access through web services. For more information, see https://aws.amazon.com/s3/ .
- Read: Supported
- Write: Not supported
Prerequisites
Before you begin, please verify that your
meets the following requirements: D s item item environment r true
Integration: Your
is connected to a running environment supported by your product edition.D s item item instance Multiple region : Multiple S3 connections can be configured in different regions.
Verify that
Enable S3 Connectivity
has been enabled in the Workspace Settings Page.- Acquire the Access Key ID and Secret Key for the S3 bucket or buckets to which you are connecting. For more information on acquiring your key/secret combination, contact your S3 administrator.
Permissions
Access to S3 requires:
Each user must have appropriate permissions to access S3.
- To browse multiple buckets through a single S3 connection, additional permissions are required. See below.
Limitations
- Authentication using IAM roles is not supported.
- Automatic region detection in the create and edit connection is not supported.
Create Connection
You can create additional S3 connections by the following method:
Create through application
You can create a S3 connection through the application.
Steps:
- Login to the application.
- In the left navigation bar, click the Connections icon.
In the Create Connection page, click the
card.D s conntype type s3user Specify the connection properties:
Property Description DefaultBucket (Optional) The default S3 bucket to which to connect. When the connection is first accessed for browsing, the contents of this bucket are displayed.
If this value is not provided, then the list of available buckets based on the key/secret combination is displayed when browsing through the connection.
Info NOTE: To see the list of available buckets, the connecting user must have the getBucketList permission. If that permission is not present and no default bucket is listed, then the user cannot browse S3.
Access Key ID Access Key ID for the S3 connection.
Secret Key Secret Key for the S3 connection.
Server Side Encryption If server-side encryption has been enabled on your bucket, you can select the server-side encryption policy to use when writing to the bucket. SSE-S3 and SSE-KMS methods are supported. For more information, see http://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html . Server Side Kms key Id When KMS encryption is enabled, you must specify the AWS KMS key ID to use for the server-side encryption. For more information, see "Server Side KMS Key Identifier" below.
For more information on the other options, see Create Connection Window.Click Save.
Server Side KMS Key Identifier
When KMS encryption is enabled, you must specify the AWS KMS key ID to use for the server-side encryption.
- Access to the key:
- Access must be provided to the authenticating user.
The AWS IAM role must be assigned to this key.
- Encrypt/Decrypt permissions for the specified KMS key ID:
- Permissions must be assigned to the authenticating user.
The AWS IAM role must be given these permissions.
- For more information, see https://docs.aws.amazon.com/kms/latest/developerguide/key-policy-modifying.html .
The format for referencing this key is the following:
Code Block |
---|
"arn:aws:kms:<regionId>:<acctId>:key/<keyId>" |
You can use an AWS alias in the following formats. The format of the AWS-managed alias is the following:
Code Block |
---|
"alias/aws/s3" |
The format for a custom alias is the following:
Code Block |
---|
"alias/<FSR>" |
where:
<FSR>
is the name of the alias for the entire key.
Create via API
For more information on the vendor and type information to use, see Connection Types.
For more information, see
D s api refdoclink |
---|
operation/createConnection |
Java VFS Service
The Java VFS Service has been modified to handle an optional connection ID, enabling S3 URLs with connection ID and credentials. The other connection details are fetched through the
to create the required URL and configuration. D s webapp type Portal
Code Block |
---|
// sample URI s3://bucket-name/path/to/object?connectionId=136 // sample java-vfs-service CURL request with s3 curl -H 'x-trifacta-person-workspace-id: 1' -X GET 'http://localhost:41917/vfsList?uri=s3://bucket-name/path/to/object?connectionId=136' |
Using S3 Connections
Uses of S3
The
D s platform |
---|
Enabled S3 Integration: The
has been configured to integrate with your S3 instance.D s platform - Creating Datasets from S3 Files: You can read in source data stored in S3. An imported dataset may be a single S3 file or a folder of identically structured files. See Reading from Sources in S3 below.
- Reading Datasets: When creating a dataset, you can pull your data from a source in S3. See Creating Datasets below.
In the
D s webapp | ||
---|---|---|
|
Info | ||||
---|---|---|---|---|
NOTE: When
|
Before you begin using S3
Access: If you are using system-wide permissions, your administrator must configure access parameters for S3 locations. If you are using per-user permissions, this requirement does not apply.
Warning Avoid using
/trifacta/uploads
for reading and writing data. This directory is used by the
.D s webapp type Portal
Secure access
Your administrator can grant access on a per-user basis or for the entire workspace.
The
D s platform |
---|
Info |
---|
NOTE: If you disable or revoke your S3 access key, you must update the S3 keys for each user or for the entire system. |
Storing data in S3
Your administrator should provide raw data or locations and access for storing raw data within S3. All
D s item | ||
---|---|---|
|
- Users should know where shared data is located and where personal data is stored without interfering with or confusing other users.
- The
stores the results of each job in a separate folder in S3.D s webapp type Portal
Info | |||||
---|---|---|---|---|---|
NOTE:
/trifacta/uploads . |
Reading from sources in S3
You can create an imported dataset from one or more files stored in S3.
Info |
---|
NOTE: Import of glaciered objects is not supported. |
Wildcards:
You can parameterize your input paths to import source files as part of the same imported dataset. For more information, see Overview of Parameterization.
Folder selection:
When you select a folder in S3 to create your dataset, you select all files in the folder to be included.
Notes:
- This option selects all files in all sub-folders and bundles them into a single dataset. If your sub-folders contain separate datasets, you should be more specific in your folder selection.
- All files used in a single imported dataset must be of the same format and have the same structure. For example, you cannot mix and match CSV and JSON files if you are reading from a single directory.
When a folder is selected from S3, the following file types are ignored:
*_SUCCESS
and*_FAILED
files, which may be present if the folder has been populated by the running environment.
Info |
---|
NOTE: If you have a folder and file with the same name in S3, search only retrieves the file. You can still navigate to locate the folder. |
Creating datasets
When creating a dataset, you can choose to read data in from a source stored from S3 or local file.
- S3 sources are not moved or changed.
- Local file sources are uploaded to
/trifacta/uploads
where they remain and are not changed.
Data may be individual files or all of the files in a folder. In the Import Data page, click the S3 tab. See Import Data Page.
D s also | ||
---|---|---|
|