Page tree

 

Contents:


In Release 4.1 and later, you should use the Admin Settings page for configuration changes whenever it is available. All parameters are accessible through this interface.

What Has Changed

In Release 3.2.1 and earlier, the Admin Settings page contained a sampling of key parameters that administrators might want to change from within the web application.

In Release 4.0 and later, the Admin Settings page now contains a searchable list of parameters available through the web application. When parameter values are saved, they are written back to trifacta-conf.json on the Trifacta node .

This new version of the page allows administrators to make changes without having to manage access to the Trifacta node. Additionally, administrators can begin configuring the platform much earlier using the UI.

This section assists administrators who have upgraded from a previous version on how to migrate their configuration workflows to this new feature. 

NOTE: The Admin Settings page is the recommended method for changing configuration. Some parameters may not be available through this page. For more information, see Platform Configuration Methods.

 

In a running instance of the platform, select User menu > Admin Settings. For more information, see Admin Settings Page.

Accessing the Admin Settings Page

Minimum requirements

At a minimum, to use the Admin Settings page, please complete the following:

  1. Install and initialize the databases. See Set up the Databases.
  2. Install or upgrade the Trifacta software on the node. See Install.
  3. See below for access.

On running instance

If you have installed or upgraded the software on the  Trifacta node  and verified that the software is connected to the database, you can begin using the Admin Settings page in the web application for further configuration. 

Steps:

  1. If you haven't done so already, start the platform. See Start and Stop the Platform.
  2. Login to the application with an administrator account. See Login.
  3. In the application menu, select User Menu > Admin Settings.

 

Limitations

Do not modify settings through the Admin Settings page and through trifacta-conf.json at the same time. Saving changes in one interface wipes out any unsaved changes in the other interface. Each requires a platform restart to apply the changes.

The following are known limitations of this interface.

  • Some parameters that are available in  trifacta-conf.json are not available through the Admin Settings page.

  •  When you save changes in the Admin Settings page, the platform is automatically restarted. 
  • You can only edit parameters through this interface. You cannot add or delete parameters. You can set parameters to empty values.

Backend-only Parameters

The following parameters are not available through the Admin Settings page. These parameters must be changed through  trifacta-conf.json on the Trifacta node. This list may not be complete.

hdfs.webhdfs.credential.password
smtp.*

Mapping

Below is the documentation from the Admin Settings page from Release 3.2.1.

For each section, a listing has been added to identify the relevant property names in the Release 4.0 version of the page.

Tip: The values in each Release 4.0 table can be pasted into the search box in the Admin Settings page.

Performance

Browser Sample Size

Release 4.0webapp.client.loadLimit


Limits the size in bytes of the data sample that is served back to the browser. 

  • For data that features wide columns or a high number of columns, you may need to increase the sample size.
  • The hard limit of 2MB (2097152 bytes)  prevents overwhelming the browser with data.

Max Result Download Size

Release 4.0webapp.maxQueryResultsSize

Limits the maximum volume in bytes of any results downloaded from the application.

  • Default is 10 GB.

Pig Progress Timeout

Release 4.0batchserver.polling.progressTimeoutSeconds


Maximum time that a Pig job is allowed to run without making forward progress before it is killed. 

NOTE: As of Release 4.1, the Pig running environment is no longer available.

 

  • Default is 3600 seconds (1 hour).

Max URL Encoded Upload

Release 4.0webapp.bodyParser.urlEncoded.limit

This value defines the largest URL-encoded payload that can be sent from the client to the Trifacta platform.

NOTE: Max JSON Upload and Max URL Encoded Upload should be modified together.

  • Default value is 10MB.

Max JSON Upload

Release 4.0webapp.bodyParser.json.limit

By default, the maximum size for a JSON body that can be sent from the client to the Trifacta platform is set to 10 MB. In some environments, individual operations may exceed this limit, which can cause memory failures or performance issues.

NOTE: Max JSON Upload and Max URL Encoded Upload should be modified together.

In particular, this issue can appear when:

  • you are performing multiple joins in your datasets
  • you have a dataset with many columns
  • your dataset contains wide column names

If needed, you can raise the maximum size of the JSON body sent to the platform.

NOTE: Setting this value over 20 MB may cause requests to the platform to fail or significant performance degradation. You may need to experiment with values to find the right value.

Job Output

CSV Output Delimiter

Release 4.0webapp.outputCsvDelimiter

 When the application generates CSV output, the default field delimiter is the comma (,). As needed, you can change the delimiter.

Tableau CSV Limit

Release 4.0webapp.maxResultSizeForTDEDownloadInMB

The maximum size in megabytes of the CSV file used for TDE Download. Default value is 40 MB.

Hadoop Job Type

Please select 'yarn'

NOTE: As of Release 2.7, Map Reduce 1 has been deprecated. For more information, please see End of Life and Deprecated Features.

Select yarn to execute your jobs, and then complete the appropriate settings in the following sections.

YARN

Resource Manager Host

Release 4.0yarn.resourcemanager.host


Host name of the node hosting the YARN resource manager.

Resource Manager Port

Release 4.0yarn.resourcemanager.port


Port of the YARN resource manager to use. 

Hive

Optionally, HIve can be used as a datastore for reading and writing datasets. See Configure for Hive.

Hive Server Host

Release 4.0This value is managed through the connection that you create through the CLI to the enterprise Hive instance. For more information, see CLI for Connections.


Hostname of your Hive instance.

Hive Server Port

Release 4.0This value is managed through the connection that you create through the CLI to the enterprise Hive instance. For more information, see CLI for Connections .


Port number through which to access your Hive instance. Default value is 10000.

Pig

NOTE: As of Release 4.1, the Pig running environment and Pig UDFs are no longer available in the platform.

UDF Jar

Release 4.0batchserver.pig.udfJar


The Trifacta platform can be configured to use Pig scripts in a Hadoop environment. This setting must provide the relative path from the Trifacta deployment directory to the Pig user definition jarfile. 

ZooKeeper

Release 4.0

ZooKeeper is no longer required by the Trifacta platform .

NOTE: ZooKeeper is no longer required by the Trifacta platform. These settings will be removed in a future release.

Host

Host of the ZooKeeper node. If ZooKeeper was not available before Trifacta installation, this host is likely the Trifacta node.

Port

Port to use to access ZooKeeper.

License

License Location

Release 4.0license.location

Specify the location of the Trifacta license key file. By default, this file is stored in the following location:

/license/license.json

This path must be specified relative to the top-level directory of the Trifacta deployment.

For more information, see License Key.

Enable/Disable Features

High Availability Hadoop Name Node

Release 4.0feature.highAvailability.namenode


Toggles high availability for HDFS namenodes in the Hadoop cluster. Additional configuration is required. See Enable Integration with Cluster High Availability.

High Availability Hadoop Job Tracker

Release 4.0feature.highAvailability.jobtracker


Toggles high availability for Jobtracker nodes in the Hadoop cluster. Additional configuration is required. See Enable Integration with Cluster High Availability.

Advanced automatic column split

Release 4.0webapp.enableStructureDetection


For complex structured datasets,  Trifacta Wrangler Enterprise can apply advanced algorithms to initially split the columns of your dataset. 

Data Download

Release 4.0webapp.enableDataDownload


By default, users can download generated results through the application. If needed, you can deselect this option to prevent data downloads. When it's disabled, results must be downloaded through the backend datastore.

NOTE: Disabling this setting prevents all users from downloading data through the application, including admin users.

Users

Release 4.0No changes to this configuration area.

You can manage aspects of user accounts through the Admin Settings page. See Manage Users.

Services

Release 4.0No changes to this configuration area.

You can review overall status of the Trifacta platform

This page has no comments.