...
Setting | Description | Default |
---|---|---|
batchserver.spark.requestTimeoutMillis | Specifies the number of milliseconds that the Batch Job Runner service should wait for a response from Spark. If this timeout is exceeded, the UI changes the job status to failed. The YARN job may continue. | 20000 (20 seconds) |
Enable random sampling across the entire dataset
By default, the application generates random samples from the first set of rows in the dataset, up to a limit. The volume of this sample set is determined by parameter. See Configure Application Limits.
For the Spark running environment, you can enable the generation of random samples across the entire dataset, which may increase the quality of your samples.
Info |
---|
NOTE: This feature cannot be enabled if relational or JDBC sources are used in your deployment. |
Steps:
In platform configuration, locate the following property, and set its value to true
:
Code Block |
---|
"webapp.enableSparkSampling": false, |
...
Configure Spark for Hortonworks
...