Page tree

 

Contents:


This section provides information on various settings that you can specify to apply minimum and maximum limits on the Trifacta® application.

Operating System Limits

Raise ulimit setting

To perform normal operations, the Trifacta platform may need to maintain a high number of simultaneously open files, the count of which may exceed the default setting for the operating system (the ulimit). 

NOTE: If the Trifacta platform hits the ulimit and is unable to open additional files, jobs may fail, or the platform may be unable to access content. The log may contain something similar to the following error: Failed on local exception: java.net.SocketException: Too many open files.


By default, the operating system sets the limit on the number of open files at 1024. Please complete the following steps to raise this limit. 

Tip: The ulimit should be raised to 64000 depending on the quality of your hardware.

 

Steps:

  1. If it is running, stop the Trifacta platform. See Start and Stop the Platform.
  2. Verify the current ulimit:

    ulimit -Hn
  3. Edit the following file: /etc/security/limits.conf.
  4. At the bottom of the file, add the following entry, which overrides the defined limit with a value of 16000:

    *   hard    nofile  16000
  5. Please add the following line after the previous one if this error is encountered: "java.lang.OutOfMemoryError: unable to create new native thread". This exception means the ulimit for processes must be increased, too:

    *   hard    nproc   16000
  6. Save the file and restart the platform. See Start and Stop the Platform.

Browser Limits

Change body limits

If you are encountering log message where the request submitted from the client is too large, you can try to raise the limit on the size of body objects submitted from the client.

NOTE: Raising these values too high can overload the browser.

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

SettingDescription
"webapp.bodyParser.urlEncoded.limit": "10mb",
Maximum permitted size of the URL-encoded body of a request submitted from the client. Size is in MB.
"webapp.bodyParser.json.limit": "10mb",
Maximum permitted size of a JSON object submitted from the client. Size is in MB.

Change maximum number of rows displayed in browser per join key

For each matching join key value, the Trifacta application displays a maximum of three rows in the browser for the current sample. So, when you join a dataset with repeating key values, you may see a fewer number of rows of data than you would expect.

NOTE: This issue is limited only to the sampled data that is displayed in the browser. When you run a job across the entire dataset, the proper number of rows are generated in the output.

For some users, this simplification may be confusing. As needed, you can use the following steps to change the maximum number of rows displayed in the browser for each join key.

Steps:

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  1. Search for and modify the following parameter:

    "webapp.client.sampleOutputTuplesPerJoinKey": 3,
  2. Save your changes and restart the platform.

Change page preview limit

In the Flow and Dataset pages, you can preview the data in datasets that you have imported or are importing. For example, when you click the Eye icon next to a dataset's name, you can see a preview of the data in the dataset, which is useful for ensuring that you have the correct data. 

Depending on the size of the datasets, you may wish to increase the limit on the size of preview data. If you are working with wide datasets, you may need to increase the limit so that you can get a solid preview of the contents. 

NOTE: Increasing this preview size may have performance impacts, particularly on lower-quality desktops. You should make adjustments with caution.

Steps:

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  1. Locate the following setting, which defines the number of bytes that are loaded by default in a preview. Maximum permitted value is 1024000 (1 MB).

    "webapp.client.previewLoadLimit": 128000,
  2. Save your changes and restart the platform.
  3. After the platform has restarted, you should preview a large dataset to verify that performance is acceptable.

Timeouts

Change application timeout limits

The front-end application respects the following timeout settings for queries issued to back-end datastores, including the Trifacta database

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

SettingsDescription
webapp.timeoutMillisecondsOverall timeout limit in milliseconds for the front-end application. Default value is 120000 (2 minutes).
jsdata.remoteTransformTimeoutMillisecondsTimeout limit in milliseconds for the Transformer Page. This setting is an override of the previous one. Default value is 180000 (3 minutes).

You can change the timeout settings if you are experiencing timeouts or other errors because of long-running queries to external data connections.

NOTE: In most environments, these settings should not be changed. Lowering them can cause reasonable queries to fail, and raising them too high can cause performance issues. Please adjust them only if you are experiencing very long query times to external sources, especially for database views.

Steps:

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  1. Locate the following configuration. Specify new timeout values in milliseconds:

    "webapp.timeoutMilliseconds": 120000,
    "jsdata.remoteTransformTimeoutMilliseconds": 180000,
  2. Save your changes and restart the platform.

Session timeout

By default, the maximum session duration is set to be one month. If needed, you can change the maximum session duration, as well as other session parameter values.

Steps:

  1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  2. Modify the following parameters, as needed:

    ParameterDescriptionDefault
    webapp.session.refreshEmbeddedExpiryDateAfterMinutes
    Refresh interval in minutes for the expiration date embedded in the session cookie5
    webapp.session.cookieSecureFlag Set a secure cookie in the client application.

    false

  3. You apply this change through the Workspace Settings Page. For more information, see Platform Configuration Methods.:

    SettingDescriptionDefault
    Session durationMaximum session duration in minutes10080 (one week)
  4. Save your changes and restart the platform.

Timeout for suggestion card suggestions

By default, the platform waits a specified length of time for the machine learning service to return suggestion cards. When more time is enabled, the service may be able to discover better suggestions based on the currently selected data.

If needed, you can change the delay limit from its default value of 80 milliseconds.

Steps:

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  1. Locate the following setting and change its value:

    "feature.mlTransformSuggestions.delayThreshold": 80,
  2. Save your changes and restart the platform.

Jobs

Maximum number of flow jobs launched in parallel

By default, the Trifacta platform permits up to four jobs from the same flow to be launched in parallel for execution. If there are more flow job launches than this limit, the additional jobs are queued for execution after one or more of the launched jobs has completed. 

Tip: This limit is most relevant when you are running a scheduled job, which can execute all jobs in a flow at the same time.


Max parallel jobs settingDescription
4

(Default) Up to four jobs from the same flow can be launched and in the process of execution at the same time.

  • Additional jobs are queued for execution.
1Jobs from the flow are executed sequentially.
0No limit. All jobs from a flow can be executed at the same time.

Steps:

  1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.
  2. Locate the following parameter. Modify it according to your needs:

    "webapp.jobLaunchingBatchSize": 4,
  3. Save your changes and restart the platform.

Job status polling interval

Periodically, the application polls the running environment to check the status of jobs in transit. This polling occurs in the following areas of the application:

  • Jobs page - Checks to see if running jobs have been resolved.
  • Flow View page - Checks to see if running jobs have been resolved.
  • Transformer page - Checks to see if sampling jobs have been resolved.

    NOTE: This setting does not apply to the initial sample which is derived from the first N rows of the dataset.

As needed, you can modify the interval at which the application polls for job status from these area. The default value is 5000 milliseconds (5 seconds).

NOTE: If this setting is lowered too much, polling requests can overlap, resulting in no updates to the application. Application performance can be impeded.

Steps:

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  1. Locate the following setting and change its value:

    "webapp.polling.jobStatusInMillis" : 5000,
  2. Save your changes and restart the platform.

Sampling

Sample load limit

By default, the Trifacta Photon running environment loads a maximum of 1 GB of data from the imported dataset for generating a new sample. This data comes from the top of the file, meaning that rows that are deeper than 1 GB in the source data cannot be included in any generated sample.

From this selection of data, a 10 MB sample of the data is derived for display in the data grid. As needed, you can configure the sample load limit to include a larger number of rows.

NOTE: Be careful making adjustments to this setting. If the volume of data is too large, you can crash the running environment.

Steps:

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  1. Locate the following setting, which is listed in terms of bytes. The default value listed below corresponds to 1 GB of data:

    "webapp.sampleLoadLimit": 1073741824,
  2. Save your changes and restart the platform. 

Photon sample size limits

When samples are created in Trifacta Photon, their size is also gated by some limits in the Trifacta Photon client. For more information, see Configure Photon Client.

Relational limits

See Configure Security for Relational Connections.

Miscellaneous limits

Date range limit

By default, the Trifacta platform supports the following date range for Datetime data type validation: 

January 1, 1400 - December 31, 2599

This date range is validated against the following default regular expression:

((?:1[4-9]|2[0-5])\d{2})

As needed, you can change the above regular expression to define your preferred date range for the Datetime data type. Your regular expression must be in the following format:

(<your_regular_expression>)

For example, the following regular expression allows dates up to December 31, 9999:

((?:1[4-9]|[0-9][0-9])\d{2})


NOTE: Use of Trifacta patterns in this field is not supported. The entry must be a valid regular expression.

Steps:

  1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.
  2. Locate the following parameter:

    webapp.yearFourDigitRegex
  3. Insert your regular expression in the required format. 
  4. Save your changes and restart the platform.
  5. You should check your new Datetime date range validation against some sample data. 

This page has no comments.