Page tree

 

Contents:


This section provides information on various settings that you can specify to apply minimum and maximum limits on the Trifacta application.

Operating System Limits

Raise ulimit setting

To perform normal operations, the Trifacta platform may need to maintain a high number of simultaneously open files, the count of which may exceed the default setting for the operating system (the ulimit). 

NOTE: If the Trifacta platform hits the ulimit and is unable to open additional files, jobs may fail, or the platform may be unable to access content. The log may contain something similar to the following error: Failed on local exception: java.net.SocketException: Too many open files.


By default, the operating system sets the limit on the number of open files at 1024. Please complete the following steps to raise this limit. 

Tip: The ulimit should be raised to 64000 depending on the quality of your hardware.

 

Steps:

  1. If it is running, stop the Trifacta platform. See Start and Stop the Platform.
  2. Verify the current ulimit:

    ulimit -Hn
  3. Edit the following file: /etc/security/limits.conf.
  4. At the bottom of the file, add the following entry, which overrides the defined limit with a value of 16000:

    *   hard    nofile  16000
  5. Please add the following line after the previous one if this error is encountered: "java.lang.OutOfMemoryError: unable to create new native thread". This exception means the ulimit for processes must be increased, too:

    *   hard    nproc   16000
  6. Save the file and restart the platform. See Start and Stop the Platform.

Browser Limits

Change body limits

If you are encountering log message where the request submitted from the client is too large, you can try to raise the limit on the size of body objects submitted from the client.

NOTE: Raising these values too high can overload the browser.

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

SettingDescription
"webapp.bodyParser.urlEncoded.limit": "10mb",
Maximum permitted size of the URL-encoded body of a request submitted from the client. Size is in MB.
"webapp.bodyParser.json.limit": "10mb",
Maximum permitted size of a JSON object submitted from the client. Size is in MB.

Change maximum number of rows displayed in browser per join key

For each matching join key value, the Trifacta application displays a maximum of three rows in the browser for the current sample. So, when you join a dataset with repeating key values, you may see a fewer number of rows of data than you would expect.

NOTE: This issue is limited only to the sampled data that is displayed in the browser. When you run a job across the entire dataset, the proper number of rows are generated in the output.

For some users, this simplification may be confusing. As needed, you can use the following steps to change the maximum number of rows displayed in the browser for each join key.

Steps:

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  1. Search for and modify the following parameter:

    "webapp.client.sampleOutputTuplesPerJoinKey": 3,
  2. Save your changes and restart the platform.

Change page preview limit

In the Flow and Dataset pages, you can preview the data in datasets that you have imported or are importing. For example, when you click the Eye icon next to a dataset's name, you can see a preview of the data in the dataset, which is useful for ensuring that you have the correct data. 

Depending on the size of the datasets, you may wish to increase the limit on the size of preview data. If you are working with wide datasets, you may need to increase the limit so that you can get a solid preview of the contents. 

NOTE: Increasing this preview size may have performance impacts, particularly on lower-quality desktops. You should make adjustments with caution.

Steps:

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  1. Locate the following setting, which defines the number of bytes that are loaded by default in a preview. Maximum permitted value is 1024000 (1 MB).

    "webapp.client.previewLoadLimit": 128000,
  2. Save your changes and restart the platform.
  3. After the platform has restarted, you should preview a large dataset to verify that performance is acceptable.

Timeouts

Change application timeout limits

The front-end application respects the following timeout settings for queries issued to back-end datastores, including the Trifacta database

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

SettingsDescription
webapp.timeoutMillisecondsOverall timeout limit in milliseconds for the front-end application. Default value is 120000 (2 minutes).
jsdata.remoteTransformTimeoutMillisecondsTimeout limit in milliseconds for the Transformer Page. This setting is an override of the previous one. Default value is 180000 (3 minutes).

You can change the timeout settings if you are experiencing timeouts or other errors because of long-running queries to external data connections.

NOTE: In most environments, these settings should not be changed. Lowering them can cause reasonable queries to fail, and raising them too high can cause performance issues. Please adjust them only if you are experiencing very long query times to external sources, especially for database views.

Steps:

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  1. Locate the following configuration. Specify new timeout values in milliseconds:

    "webapp.timeoutMilliseconds": 120000,
    "jsdata.remoteTransformTimeoutMilliseconds": 180000,
  2. Save your changes and restart the platform.

Session timeout

By default, the maximum session duration is set to be one month. If needed, you can change the maximum session duration, as well as other session parameter values.

Steps:

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  1. Modify the following parameters, as needed:

    ParameterDescriptionDefault
    webapp.session.durationInMins
    Maximum session duration in minutes10080 (one week)
    webapp.session.refreshEmbeddedExpiryDateAfterMinutes
    Refresh interval in minutes for the expiration date embedded in the session cookie5
    webapp.session.cookieSecureFlag
    Set a secure cookie in the client application.

    false

  2. Save your changes and restart the platform.

Timeout for suggestion card suggestions

By default, the platform waits a specified length of time for the machine learning service to return suggestion cards. When more time is enabled, the service may be able to discover better suggestions based on the currently selected data.

If needed, you can change the delay limit from its default value of 80 milliseconds.

Steps:

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  1. Locate the following setting and change its value:

    "feature.mlTransformSuggestions.delayThreshold": 80,
  2. Save your changes and restart the platform.

Jobs

Job status polling interval

Periodically, the application polls the running environment to check the status of jobs in transit. This polling occurs in the following areas of the application:

  • Jobs page - Checks to see if running jobs have been resolved.
  • Flow View page - Checks to see if running jobs have been resolved.
  • Transformer page - Checks to see if sampling jobs have been resolved.

    NOTE: This setting does not apply to the initial sample that is generated when a dataset is loaded into the Transformer page.

As needed, you can modify the interval at which the application polls for job status from these area. The default value is 5000 milliseconds (5 seconds).

NOTE: If this setting is lowered too much, polling requests can overlap, resulting in no updates to the application. Application performance can be impeded.

Steps:

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  1. Locate the following setting and change its value:

    "webapp.polling.jobStatusInMillis" : 5000,
  2. Save your changes and restart the platform.

Sampling

Sample load limit

By default, the Photon running environment loads a maximum of 1 GB of data from the imported dataset for generating a new sample. This data comes from the top of the file, meaning that rows that are deeper than 1 GB in the source data cannot be included in any generated sample.

From this selection of data, a 10 MB sample of the data is derived for display in the data grid. As needed, you can configure the sample load limit to include a larger number of rows.

NOTE: Be careful making adjustments to this setting. If the volume of data is too large, you can crash the running environment.

Steps:

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  1. Locate the following setting, which is listed in terms of bytes. The default value listed below corresponds to 1 GB of data:

    "webapp.sampleLoadLimit": 1073741824,
  2. Save your changes and restart the platform. 

Relational limits

Configure relational read stream limits

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

The Data Service reads data from Hive, Redshift, and relational sources in streams of records. The size of these streams are defined by the following parameters:

"data-service.sqlOptions.maxReadStreamRecords": -1,
"data-service.sqlOptions.limitedReadStreamRecords": 1000000,
"data-service.sqlOptions.initialReadStreamRecords": 25,
"data-service.sqlOptions.hiveReadStreamRecords": 100000000,
PropertyDescription
maxReadStreamRecords

The maximum number of JDBC records pulled in per stream read during batch execution.

If this value is set to -1, then no limit is applied.

limitedReadStreamRecords

Max number of records read for the initial sample and quick scan sampling.

initialReadStreamRecordsInitial number of records to read for client-side preview and for client-side transform. Set to -1 to apply no limit.
hiveReadStreamRecords

Max number of records that can be read from Hive, if maxReadStreamRecords is -1.

NOTE: This value cannot be set to -1, which results in a Data Service error. Hive reads must be limited.

Long load timeout limits

For long loading relational sources, a timeout is applied to limit the permitted load time. As needed, you can modify this limit to account for larger load times.

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  1. Locate and edit the following parameter:

    "webapp.connectivity.longLoadTimeoutMillis": 120000,
  2. Save your changes and restart the platform.
PropertyDescription
longLoadTimeoutMillisMax number of milliseconds to wait for a long-loading data source. The default value is 120000 (2 minutes).

VARCHAR string length max

By default, when the Trifacta platform pubilshes to one of the following relational systems, String types are published to VARCHAR columns with a maximum length of 256 characters. This setting applies to the following relational systems:

Relational DBMaximum string length
Hive65,535
Redshift65,535
SQL DW

8000

NOTE: This setting applies to the data service, which is used for publication to all three systems.

As needed, you can change the maximum permitted length of strings published from the Trifacta platform to VARCHAR columns.

Steps:

  1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.
  2. Modify the following property:

    "data-service.sqlOptions.stringSizeInBytes": 256,
  3. Save your changes and restart the platform.

This page has no comments.