Skip to main content

Storage Maintenance

This page provides some tips and guidelines fo maintaining your backend storage.

Note

Except for temporary files that it creates as part of normal operations or storage used as part of feature execution, Designer Cloud Powered by Trifacta Enterprise Edition does not remove files from the backend storage for safety reasons. Unless resources have been provided to you by Alteryx, management of the backend datastore is the responsibility of the customer.

Note

Designer Cloud Powered by Trifacta Enterprise Edition does not store data on the Trifacta node where the software is installed.

Note

Designer Cloud Powered by Trifacta Enterprise Edition does not modify source data.

Alteryx Storage

Log files are stored by default in the following location on the Trifacta node:

/opt/trifacta/logs

Service logs

Service log files are automatically auto-rotated at 50 MB. For more information on configuring log rotation, see Configure Logging for Services.

Job logs

Logs related to job execution are not automatically rotated.

Note

Job log files can accumulate over time. As a good rule of thumb, you can set up a recurring job through an external scheduler to purge old job logs that are older than six months.

Job log files are stored in the following directories:

/opt/trifacta/logs/jobs
/opt/trifacta/logs/jobgroups

They are organized by job identifier in sub-directories.

For more information on job logs, see Diagnose Failed Jobs.

Base Storage Layer

Temp files

Job temp files

Temporary files may be written to the temporary directory on the backend datastore, particularly during job execution.

/tmp

Note

These files may be purged during restarts of the platform.

Spark temp files

During execution of jobs, Spark may use the following directories on backend storage for storage of temporary files:

/user/<UserID>
/trifacta/tempfiles

Samples and profile statistics

The Designer Cloud Powered by Trifacta platform generates your samples and profiling statistics in one of the following directories for each user:

  • The default directory:

    /trifacta/queryResults/.trifacta
  • The user-defined output directory

Note

These files should be removed on a periodic basis.

Datasets

While samples and job results may be retained on backend storage, the Designer Cloud Powered by Trifacta platform does not store your source data.

Note

Datasets removed from the Library are removed as references to the product. The underlying data is not actually deleted.

Storage for features

The following features do store data on the base storage layer.

File conversion

Data sources that are stored in a binary format, such as PDF or Excel, or that require additional processing, such as JSON, must be converted to file format that can be natively ingested by the Designer Cloud Powered by Trifacta platform. Typically these files are stored in the base storage layer in CSV format.

This feature is enabled by default.

JDBC ingestion

When JDBC ingestion is enabled, some objects used in sampling that are sourced from JDBC sources may be stored in the base storage layer for faster retrieval. After job execution, these objects are deleted, or if datasource caching is enabled, are moved to the appropriate datasource cache.

For more information, see Configure JDBC Ingestion.

Datasource caching

If datasource caching has been enabled, cached objects can be stored in either a global or user-specific cache. For more information, see Configure Data Source Caching.