D toc
Excerpt | ||||
---|---|---|---|---|
This section describes how you interact through the
|
- Simple Storage Service (S3)is an online data storage service provided by Amazon, which provides low-latency access through web services. For more information, see https://aws.amazon.com/s3/.
Uses of S3
The
D s platform |
---|
Enabled S3 Integration: The
has been configured to integrate with your S3 instance.For more information, see S3 Access.D s platform - Creating Datasets from S3 Files: You can read in source data stored in S3. An imported dataset may be a single S3 file or a folder of identically structured files. See Reading from Sources in S3 below.
- Reading Datasets: When creating a dataset, you can pull your data from a source in S3. See Creating Datasets below.
Writing Results: After a job has been executed, you can write the results back to S3.
In the
D s webapp |
---|
Info | |
---|---|
NOTE: When the
|
Before You Begin Using S3
Access: If you are using system-wide permissions, your administrator must configure access parameters for S3 locations. If you are using per-user permissions, this requirement does not apply. See S3 Access.
Warning Avoid using
/trifacta/uploads
for reading and writing data. This directory is used by the
.D s webapp Your administrator should provide a writeable home output directory for you. This directory location is available through your user profile. See Storage Config Page.
Secure Access
Your administrator can grant access on a per-user basis or for the entire
D s platform |
---|
The
D s platform |
---|
Info |
---|
NOTE: If you disable or revoke your S3 access key, you must update the S3 keys for each user or for the entire system. |
Storing Data in S3
Your administrator should provide raw data or locations and access for storing raw data within S3. All
D s item | ||
---|---|---|
|
- Users should know where shared data is located and where personal data can be saved without interfering with or confusing other users.
- The
stores the results of each job in a separate folder in S3.D s webapp
Info | ||
---|---|---|
NOTE: The
/trifacta/uploads . |
Reading from Sources in S3
You can create an imported dataset from one or more files stored in S3.
D s dataadmin role |
---|
Info |
---|
NOTE: Import of glaciered objects is not supported. |
Wildcards:
You can parameterize your input paths to import source files as part of the same imported dataset. For more information, see Overview of Parameterization.
Folder selection:
When you select a folder in S3 to create your dataset, you select all files in the folder to be included.
Notes:
- This option selects all files in all sub-folders and bundles them into a single dataset. If your sub-folders contain separate datasets, you should be more specific in your folder selection.
- All files used in a single imported dataset must be of the same format and have the same structure. For example, you cannot mix and match CSV and JSON files if you are reading from a single directory.
When a folder is selected from S3, the following file types are ignored:
*_SUCCESS
and*_FAILED
files, which may be present if the folder has been populated by the running environment.
Info |
---|
NOTE: If you have a folder and file with the same name in S3, search only retrieves the file. You can still navigate to locate the folder. |
Creating Datasets
When creating a dataset, you can choose to read data in from a source stored from S3 or local file.
- S3 sources are not moved or changed.
- Local file sources are uploaded to
/trifacta/uploads
where they remain and are not changed.
Data may be individual files or all of the files in a folder. In the Import Data page, click the S3 tab. See Import Data Page.
Tip |
---|
Tip: Users can create secondary connections to specific S3 buckets. For more information, see External S3 Connections. |
Writing Results
When you run a job, you can specify the S3 bucket and file path where the generated results are written. By default, the output is generated in your default bucket and default output home directory.
- Each set of results must be stored in a separate folder within your S3 output home directory.
- For more information on your output home directory, see Storage Config Page.
Info |
---|
NOTE: The |
Warning | ||||||||
---|---|---|---|---|---|---|---|---|
If |
Info |
---|
NOTE: When writing files to S3, you may encounter an issue where the UI indicates that the job failed, but the output file or files have been written to S3. This issue may be caused when S3 does not report the files back to the application before the S3 consistency timeout has expired. For more information on raising this timeout setting, see S3 Access. |
Creating a new dataset from results
As part of writing results, you can choose to create a new dataset, so that you can chain together data wrangling tasks.
Info |
---|
NOTE: When you create a new dataset as part of your results, the file or files are written to the designated output location for your user account. Depending on how your permissions are configured, this location may not be accessible to other users. |
Purging Files
Other than temporary files, the
D s platform |
---|
- Uploaded datasets
- Generated samples
- Generated results
If you are concerned about data accumulation, you should create a bucket policy to periodically backup or purge directories in use. For more information, please see the S3 documentation.
D s also | ||
---|---|---|
|