Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0682

D toc

The 

D s platform
rtrue
 utilizes the batch job runner service to orchestrate jobs that are executed on the selected backend running environment. This service passes jobs to the backend and tracks their progress until success or failure. This service is enabled by default.

Configure Timeout

SettingDefault ValueDescription
batchserver.spark.requestTimeoutMillis120000 (2 minutes)Maximum number of milliseconds that the Batch Job Runner service should wait for a response from the Spark Job service during job execution. Default is 2 minutes.

Configure Job Threads

As needed, you can configure the number of worker threads assigned to each process that is managed by the batch job runner. Depending of the volume and complexity of jobs that you run of each type, you may choose to modify these settings to improve performance for key job types. 

Tip

Tip: These settings can be configured through the Admin Settings page in the

D s webapp
. See Admin Settings Page.


By running environment:

SettingDefault ValueDescription
batchserver.workers.photon.max2

Number of worker threads for running

D s photon
jobs. This value corresponds to the maximum number of photon jobs that can be queued at the same time.

For more information, see Configure Photon Running Environment.

batchserver.workers.spark.max16Number of worker threads for running Spark jobs. For more information, see Configure Spark Running Environment.
batchserver.workers.wrangle.max16

Number of worker threads for running transformation jobs.


By job type:

SettingDefault ValueDescription
batchserver.workers.ingest.max
16Maximum number of worker threads for running ingest jobs, which are used for loading large-scale relational data into the Transformer page. After this maximum number has been reached, subsequent requests are queued.
batchserver.workers.profile.max
16 Maximum number of worker threads for running profile jobs, which provide summary and detail statistics on job results. 
batchserver.workers.publish.max
16 Maximum number of worker threads for running publish jobs, which deliver pre-generated job results to other datastores. 
batchserver.workers.fileconverter.max
16 Maximum number of worker threads for running fileconverter jobs, which are used to convert source formats into output formats. 
batchserver.workers.filewriter.max
16 Maximum number of worker threads for running filewriter jobs, which are used for writing file-based outputs to a specified storage location.

 

Depending on your running environment, there may be additional parameters that you can configure to affect Batch Job Runner for that specific environment:

Configure BJR for EMR

Multiple BJR instances

If the 

D s platform
 is connected to an EMR cluster, multiple instances of the batch job runner are deployed to manage jobs across the cluster so that if one fails, YARN jobs are still tracked. No configuration is required.

YARN logs from EMR

The following properties below can be modified for batch job runner:

SettingDefault ValueDescription
aws.emr.getLogsOnFailurefalse

When set to true, YARN logs from all nodes in the EMR cluster are collected from S3 and stored on the

D s node
in the following location:

Code Block
/opt/trifacta/logs/jobs/<jobId>/container

where: <jobId> is the

D s platform
internal identifier for the job that failed.

aws.emr.getLogsForAllJobsfalse

When set to true, YARN logs from nodes in the EMR cluster are collected and stored in the above location for all jobs, whether they succeed or fail.

Info

NOTE: This parameter is intended for debugging purposes only.

Configure Database

The Batch Job Runner utilizes its own Jobs database. For more information, see Configure the Databases.

Logging

For more information on logging for the service, see Configure Logging for Services.