This section describes how you interact through the with your S3 environment.
The can use S3 for the following tasks:
Enabled S3 Integration: The has been configured to integrate with your S3 instance.For more information, see Enable S3 Access.
Writing Results: After a job has been executed, you can write the results back to S3. See Writing Results below.
In the , S3 is accessed through the S3 browser. See S3 Browser.
NOTE: When the executes a job on a dataset, the source data is untouched. Results are written to a new location, so that no data is disturbed by the process.
Access: If you are using system-wide permissions, your administrator must configure access parameters for S3 locations. If you are using per-user permissions, this requirement does not apply. See Enable S3 Access.
Your administrator should provide a writeable home output directory for you. This directory location is available through your user profile. See Storage Config Page.
Your administrator can grant access on a per-user basis or for the entire .
The utilizes an S3 key and secret to access your S3 instance. These keys must enable read/write access to the appropriate directories in the S3 instance.
NOTE: If you disable or revoke your S3 access key, you must update the S3 keys for each user or for the entire system.
For more information, see Enable S3 Access.
Your administrator should provide raw data or locations and access for storing raw data within S3. All should have a clear understanding of the folder structure within S3 where each individual can read from and write results.
NOTE: The does not modify source data in S3. Source data stored in S3 is read without modification from source locations, and source data uploaded to the is stored in
You can create an imported dataset from one or more files stored in S3.
NOTE: Import of glaciered objects is not supported.
You can parameterize your input paths to import source files as part of the same imported dataset. For more information, see Overview of Parameterization.
When you select a folder in S3 to create your dataset, you select all files in the folder to be included.
When a folder is selected from S3, the following file types are ignored:
*_FAILEDfiles, which may be present if the folder has been populated by the running environment.
NOTE: If you have a folder and file with the same name in S3, search only retrieves the file. You can still navigate to locate the folder.
When creating a dataset, you can choose to read data in from a source stored from S3 or local file.
/trifacta/uploadswhere they remain and are not changed.
Data may be individual files or all of the files in a folder.
When you run a job, you can specify the S3 bucket and file path where the generated results are written. By default, the output is generated in your default bucket and default output home directory.
If is using S3, do not use the
NOTE: When writing files to S3, you may encounter an issue where the UI indicates that the job failed, but the output file or files have been written to S3. This issue may be caused when S3 does not report the files back to the application before the S3 consistency timeout has expired. For more information on raising this timeout setting, see Enable S3 Access.
As part of writing results, you can choose to create a new dataset, so that you can chain together data wrangling tasks.
NOTE: When you create a new dataset as part of your results, the file or files are written to the designated output location for your user account. Depending on how your permissions are configured, this location may not be accessible to other users.
Other than temporary files, the does not remove any files that were generated or used by the platform, including:
If you are concerned about data accumulation, you should create a bucket policy to periodically backup or purge directories in use. For more information, please see the S3 documentation.