The utilizes the batch job runner service to orchestrate jobs that are executed on the selected backend running environment. This service passes jobs to the backend and tracks their progress until success or failure. This service is enabled by default.
|Maximum number of milliseconds that the Batch Job Runner service should wait for a response from the Spark Job service during job execution. Default is 2 minutes.|
The following parameters can be modified to change the Batch Job Runner polling intervals for various types of jobs.
|Polling interval in seconds for Batch Job Runner to check for status of ingest jobs.|
|Polling interval in seconds for Batch Job Runner to check for status of publishing jobs.|
|Polling interval in seconds for Batch Job Runner to check for status of wrangling jobs.|
|Duration in milliseconds for the service to wait for a job heartbeat before failing.|
As needed, you can configure the number of worker threads assigned to each process that is managed by the batch job runner. Depending of the volume and complexity of jobs that you run of each type, you may choose to modify these settings to improve performance for key job types.
Tip: These settings can be configured through the Admin Settings page in the . See Admin Settings Page.
By running environment:
Number of worker threads for running jobs. This value corresponds to the maximum number of photon jobs that can be queued at the same time.
For more information, see Configure Photon Running Environment.
|16||Number of worker threads for running Spark jobs. For more information, see Configure Spark Running Environment.|
Number of worker threads for running transformation jobs.
By job type:
| ||16||Maximum number of worker threads for running ingest jobs, which are used for loading relational data into the platform. After this maximum number has been reached, subsequent requests are queued.|
| ||16||Maximum number of worker threads for running profile jobs, which provide summary and detail statistics on job results.|
| ||16||Maximum number of worker threads for running publish jobs, which deliver pre-generated job results to other datastores.|
| ||16||Maximum number of worker threads for running fileconverter jobs, which are used to convert source formats into output formats.|
| ||16||Maximum number of worker threads for running filewriter jobs, which are used for writing file-based outputs to a specified storage location.|
Depending on your running environment, there may be additional parameters that you can configure to affect Batch Job Runner for that specific environment:
If the is connected to an EMR cluster, multiple instances of the batch job runner are deployed to manage jobs across the cluster so that if one fails, YARN jobs are still tracked. No configuration is required.
The following properties below can be modified for batch job runner:
When set to
When set to
By default, the Jobs database, which is used by the batch job runner, does not remove information about jobs after they have been executed.
batch-job-runner.log. For more information, see Configure Logging for Services.
job.logfile written in the job directory. For more information, see Diagnose Failed Jobs.
As needed, you can enable the to perform cleanup operations on the Jobs database.
NOTE: If cleanup is not enabled, the Jobs database continues to grow. You should perform periodic cleanups in conjunction with your enterprise database policies.
To enable cleanup of the Jobs database, please complete the following steps.
Locate the following settings and set them accordingly:
|batch-job-runner.cleanup.enabled||Set this value to |
|batch-job-runner.cleanup.interval||The interval in ISO-8601 repeating intervals format at which batch-job-runner should clean outdated information about jobs. Default value is |
|batch-job-runner.cleanup.maxAge||The retention time for deployments in ISO-8601 interval format after which information about jobs is considered outdated. Default value is |
|batch-job-runner.cleanup.maxDelete||Maximum number of jobs whose information can be deleted per cleanup pass. Default value is |
For more information on ISO-8601 interval format, see https://en.wikipedia.org/wiki/ISO_8601#Repeating_intervals.
Save your changes and restart the platform.
The Batch Job Runner utilizes its own Jobs database. For more information, see Configure the Databases.
For more information on logging for the service, see Configure Logging for Services.