Skip to main content

Configure Data Service

The Data Service enables the Designer Cloud Powered by Trifacta platform to stream metadata and records from JDBC sources for sampling and job execution in the Trifacta Photon running environment. This section describes how to enable and configure the service, including performance tweaks and connection-specific configuration.

Configure Service

The following basic properties enable the service and specify basic location for it.

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

Property

Description

"data-service.enabled"

When true, the data service is enabled.

Note

When set to false, access to any relational connection is prevented.

Default istrue.

"data-service.host"

Hostname for the service. Default is localhost.

"data-service.port"

Port number used by the service. Default is41912.

Note

If you are changing the port number, avoid creating conflicts with existing ports in use. For more information, see System Ports.

"data-service.classpath"

The Java class path for the data service.

"data-service.autoRestart"

When true, the data service is automatically restarted if it crashes. Default istrue.

"data-service.vendorPath"

Path to the vendor configuration files for relational connections. Default value:

%(topOfTree)s/services/data-service/build/conf/vendor

Configure SQL Options

Configure relational read stream limits

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

The Data Service reads data from relational sources in streams of records. You can modify the following parameters to configure the limits of SQL record streaming during read operations. The size of these streams are defined by the following parameters:

"data-service.sqlOptions.maxReadStreamRecords": -1,
"data-service.sqlOptions.limitedReadStreamRecords": 1000000,
"data-service.sqlOptions.initialReadStreamRecords": 25,
"data-service.sqlOptions.hiveReadStreamRecords": 100000000,

Property

Description

"data-service.sqlOptions.maxReadStreamRecords"

The maximum number of JDBC records pulled in per stream read during batch execution.

If this value is set to -1, then no limit is applied.

"data-service.sqlOptions.limitedReadStreamRecords"

Max number of records read for the initial sample and quick scan sampling. Setting to -1 means there is no limit.

"data-service.sqlOptions.initialReadStreamRecords"

Initial number of records to read for client-side preview and for client-side transform. Set to -1 to apply no limit.

"data-service.sqlOptions.hiveReadStreamRecords"

Max number of records that can be read from Hive, if maxReadStreamRecords is -1.

Note

This value cannot be set to -1, which results in a Data Service error. Hive reads must be limited.

Configure Caching

The data service maintains a cache of JDBC objects that have been retrieved for use. You can configure the following properties to tune settings of the cache.

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

Property

Description

"data-service.cacheOptions.validationDelayMilliseconds"

Number of milliseconds to wait between checks validating cached pools. Default is 3600000 (1 hour).

"data-service.cacheOptions.maxSize"

Maximum number of objects in the cache. Default is 100.

Note

Set this value to 0 to disable data service caching.

"data-service.cacheOptions.expirySeconds"

Objects in the cache that are older than this number of seconds are automatically expired. Default is 86400 (1 day).

Enable Connection Pooling

By default, JDBC connection pooling is disabled. Optionally, you can choose to enable the feature, although this is not recommended.

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

Property

Description

"data-service.connectionPooling.enabled"

When set to true, connection pooling is enabled for JDBC-based connections. Connections are returned from a C3P0 connection pool. No other configuration is required.

When set to false, connection pooling is disabled. Connections are established on-demand for specific jobs.

By default, this flag is set to false.

Configure for Specific Integrations

Configure Data Service for Hive

The following properties apply to how the platform connects to Hive.

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

Property

Description

"data-service.hiveManagedTableFormat"

Managed table format for your Hive deployment. Default is PARQUET.

"data-service.hiveJdbcJar"

Path to the JAR to use for JDBC connectivity to Hive. Default path depends on your Hadoop distribution.

Configure Data Service for Tableau Server

The following properties apply to how the platform publishes to Tableau Server.

You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

Property

Description

"data-service.tableauBufferSizeInBytes"

Number of bytes of data to include in each HTTP request chunk when publishing to Tableau Server. When the Trifacta Application publishes a file to Tableau Server, the file is divided into chunks, and each chunk is attached as part of a HTTP request payload. This flag controls the chunk size in bytes.

When the chunk size is large, the number of HTTP requests required to send the whole file to Tableau Server is smaller. However, a large chunk size increases the risk of a RequestTimeoutException, which causes the publishing job to fail.

Default is 3000000 bytes.

Additional Configuration

Data service shutdown timeout

If a shutdown or restart command is issued for the platform, the data service may be in the process of waiting for pending requests from various services before it can gracefully shut down. In some cases, the supervisord process, which governs platform starting and stopping, fails to restart while waiting for the data service to shut down.

By default, the data service waits for five seconds (5000 milliseconds) before automatically shutting down, regardless of the pending requests. If needed, you can adjust this shutdown timeout setting.

Warning

Do not modify this setting unless you are experiencing problems with how data service is interacting with other services during platform stop, start, or restart operations.

Notes:

  • This value should be set to a higher value than the stop-wait setting for the supervisord service. Otherwise, the supervisord service may not wait long enough for the data service to complete its shutdown.

  • To configure the supervisord setting:

    • Edit the following file:

      /conf/supervisord.conf
    • Locate the stopwaitsecs attribute.

    • Edit the attribute to be higher than the value you are configuring for the setting below. The supervisord value is in seconds.

Steps:

  1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  2. Locate the following setting:

    "data-service.shutdownTimeout": 5000,
  3. Modify the value in milliseconds.

  4. Save your changes and restart the platform.

Additional configuration areas

The following aspects of the data service can be configured outside of the application:

  • Connection pool size and retry parameters

  • Vendor field mappings

  • Oracle ciphers for SSL connections

  • JDBC fetch size by vendor

For more information, please contact Alteryx Customer Success and Services.

Logging

For more information on logging for the service, see Configure Logging for Services.

Other Topics

  • If you are reading large datasets from relational sources, you can enable JDBC ingestion, which reads source data in the background and stages on the backend datastore for execution. For more information, see Configure JDBC Ingestion.

  • Optionally, SSO authentication can be applied to relational connections. For more information, see Enable SSO for Relational Connections.