supports different options for reading and writing data from your storage systems.
The base storage layer is the datastore where uploads data, generates profiles, results, and samples. By default, job results are written on the base storage layer. You can configure the base storage layer and other required settings.
Tip: The base storage layer must be a file-based system.
In general, all base storage layers provide similar capabilities for storing, creating, reading, and writing datasets.
The base storage layer enables you to perform the following functions:
Cached data: You can enable a cache on the base storage layer, which allows data that has been ingested to remain on the base storage layer for a period of time. This cache allows for faster performance if you need to use the data at a later time.
creates and maintains the following directories and their sub-directories on the base storage layer:
Storage of datasets uploaded through the . Directories beneath this one are listed by the internal identifier for each user of the product who has uploaded at least one file.
Default storage of results generated job executions. Directories beneath this one are listed by the internal identifier for each user of the product who has run at least one job.
For each user, these sub-directories are the default storage location for job results. These locations can be modified. See Preferences Page.
Storage of custom dictionary files uploaded by users.
Temporary storage location for files required for use of the product.
requires the following operating system level permissions on the listed directories and sub-directories:
|Directory||Owner Min Permissions||Group Min Permissions||World Min Permissions|
supports the following base storage layers.
NOTE: In some deployments, the base storage layer is pre-configured for you and cannot be modified. After the base storage layer has been defined, you cannot change it.
NOTE: For all storage layers, the source data is untouched. Results are written to a location whenever a job is executed on a source dataset.
TFS is a S3-backed data storage service provided by for importing, storing, sampling, and generating results. is enabled as part of setting up your product.
For more information, see Using TFS.
Simple Storage Service (S3) is an online data storage service provided by Amazon, which provides low-latency access through web services. For more information, see https://aws.amazon.com/s3/ .
For more information, see External S3 Connections.
Maintenance of the base storage layer must be in accordance with your enterprise policies.
Unless the base storage layer is managed by , it is the responsibility of the customer to maintain access and perform any required backups of data stored in the base storage layer.
NOTE: Except for temporary files, does not perform any cleanup of the base storage layer.
You can create connections to external storage systems. You can integrate with an external datastore. Depending on the type of connection and your permissions, the connection can be:
You can create and edit connections between and external data stores. You can create either file-based or table-based connections to individual storage units, such as databases or buckets.
NOTE: In your environment, creation of connections may be limited to administrators only. For more information, contact your workspace administrator.
Tip: Administrators can edit any public connection.
NOTE: After you create a connection, you cannot change its connection type. You must delete the connection and start again.
For more information, see Connection Types.
In addition to the base storage layer, you may be able to connect to other file-based systems. For example, if your base storage layer is HDFS, you can also connect to S3.
NOTE: If HDFS is specified as your base storage layer, you cannot publish to Redshift.
For more information, see Connection Types.
The can be leveraged for loading and transforming data in data warehouses in the cloud. These integrations offer high performance access to reading in datasets from these and other sources, performing transformations, and writing results back to the data warehouse as needed.
Through AWS infrastructure, the can integrate with your existing Snowflake data warehouse.
Additional configuration may be required:
For more information, see Snowflake Connections.
When your base storage layer is S3, you can create connections to your Redshift data warehouse.
If you are using IAM roles, the IAM role for each user must include permissions to access Redshift. For more information, see Required AWS Account Permissions.
For more information, see Amazon Redshift Connections.
When you are working with relational data, you can configure the database connections after you have completed the platform configuration and have validated that it is working for locally uploaded files.
NOTE: Database connections cannot be deleted if their databases host imported datasets that are in use by . Remove these imported datasets before deleting the connection.
For more information, see Using Databases.
To integrate with an external system, the requires:
Except for cleanup of temporary files, the does not maintain external storage systems.