This section describes how you interact through the  with your S3 environment.

Uses of S3

The  can use S3 for the following tasks for reading and writing data:

  1. Enabled S3 Integration: The  has been configured to integrate with your S3 instance. For more information, see Enable S3 Access.
  2. Creating Datasets from S3 Files: You can read in source data stored in S3. An imported dataset may be a single S3 file or a folder of identically structured files. See Reading from Sources in S3 below.
  3. Reading Datasets: When creating a dataset, you can pull your data from a source in S3. See Creating Datasets below.
  4. Writing Job Results: After a job has been executed, you can write the results back to S3. See Writing Job Results below.

In the , S3 is accessed through the S3 browser. See S3 Browser.

NOTE: When the executes a job on a dataset, the source data is untouched. Results are written to a new location, so that no data is disturbed by the process.

Before You Begin Using S3

Secure Access

Your administrator can grant access on a per-user basis or for the entire .

The  utilizes an S3 key and secret to access your S3 instance. These keys must enable read/write access to the appropriate directories in the S3 instance. 

NOTE: If you disable or revoke your S3 access key, you must update the S3 keys for each user or for the entire system.

For more information, see Enable S3 Access.

Storing Data in S3

Your administrator should provide raw data or locations and access for storing raw data within S3. All  should have a clear understanding of the folder structure within S3 where each individual can read from and write their job results. 

NOTE: The does not modify source data in S3. Source data stored in S3 is read without modification from source locations, and source data uploaded to the is stored in /trifacta/uploads.

Reading from Sources in S3

You can create an imported dataset from one or more files stored in S3.

When you select a folder in S3 to create your dataset, you select all files in the folder to be included. Notes:

When a folder is selected from S3, the following file types are ignored:

Creating Datasets

When creating a dataset, you can choose to read data in from a source stored from S3 or local file. 

Data may be individual files or all of the files in a folder.

Writing Job Results

When you run a job, you can specify the S3 bucket and file path where the generated results are written. By default, the output is generated in your default bucket and default output home directory.

If is using S3, do not use the trifacta/uploads directory. This directory is used for storing uploads and metadata, which may be used by multiple users. Manipulating files outside of the can destroy other users' data. Please use the tools provided through the interface for managing uploads from S3.

Creating a new dataset from results

As part of writing job results, you can choose to create a new dataset, so that you can chain together data wrangling tasks.

NOTE: When you create a new dataset as part of your job results, the file or files are written to the designated output location for your user account. Depending on how your permissions are configured, this location may not be accessible to other users.