The Data Service enables the Trifacta® platform to stream metadata and records from JDBC sources for sampling and job execution in the Trifacta Photon running environment. This section describes how to enable and configure the service, including performance tweaks and connection-specific configuration.
The following basic properties enable the service and specify basic location for it.
NOTE: When set to
|Hostname for the service. Default is |
Port number used by the service. Default is
NOTE: If you are changing the port number, avoid creating conflicts with existing ports in use. For more information, see System Ports.
|The Java class path for the data service.|
Path to the vendor configuration files for relational connections. Default value:
Configure SQL Options
Configure relational read stream limits
The Data Service reads data from relational sources in streams of records. You can modify the following parameters to configure the limits of SQL record streaming during read operations. The size of these streams are defined by the following parameters:
The maximum number of JDBC records pulled in per stream read during batch execution.
If this value is set to
Max number of records read for the initial sample and quick scan sampling. Setting to -1 means there is no limit.
|Initial number of records to read for client-side preview and for client-side transform. Set to |
Max number of records that can be read from Hive, if maxReadStreamRecords is
NOTE: This value cannot be set to
The data service maintains a cache of JDBC objects that have been retrieved for use. You can configure the following properties to tune settings of the cache.
|Number of milliseconds to wait between checks validating cached pools. Default is |
Maximum number of objects in the cache. Default is
NOTE: Set this value to
|Objects in the cache that are older than this number of seconds are automatically expired. Default is |
Configure for Specific Integrations
Configure Data Service for Hive
The following properties apply to how the platform connects to Hive.
|Managed table format for your Hive deployment. Default is |
|Path to the JAR to use for JDBC connectivity to Hive. Default path depends on your Hadoop distribution.|
Configure Data Service for Tableau Server
The following properties apply to how the platform publishes to Tableau Server.
Number of bytes of data to include in each HTTP request chunk when publishing to Tableau Server. When the Trifacta application publishes a file to Tableau Server, the file is divided into chunks, and each chunk is attached as part of a HTTP request payload. This flag controls the chunk size in bytes.
When the chunk size is large, the number of HTTP requests required to send the whole file to Tableau Server is smaller. However, a large chunk size increases the risk of a RequestTimeoutException, which causes the publishing job to fail.
The following aspects of the data service can be configured outside of the application:
- Connection pool size and retry parameters
- Vendor field mappings
- Oracle ciphers for SSL connections
- JDBC fetch size by vendor
For more information, please contact Trifacta Customer Success Services.
For more information on logging for the service, see Configure Logging for Services.
- If you are reading large datasets from relational sources, you can enable JDBC ingestion, which reads source data in the background and stages on the backend datastore for execution. For more information, see Configure JDBC Ingestion.
- Optionally, SSO authentication can be applied to relational connections. For more information, see Enable SSO for Relational Connections.
This page has no comments.