Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

SettingDescriptionDefault
batchserver.spark.requestTimeoutMillisSpecifies the number of milliseconds that the Batch Job Runner service should wait for a response from Spark. If this timeout is exceeded, the UI changes the job status to failed. The YARN job may continue.20000 (20 seconds)

Enable random sampling across the entire dataset

By default, the application generates random samples from the first set of rows in the dataset, up to a limit. The volume of this sample set is determined by parameter. See Configure Application Limits

For the Spark running environment, you can enable the generation of random samples across the entire dataset, which may increase the quality of your samples. 

Info

NOTE: This feature cannot be enabled if relational or JDBC sources are used in your deployment.

 

Steps:

In platform configuration, locate the following property, and set its value to true

Code Block
"webapp.enableSparkSampling": false,

...

Configure Spark for Hortonworks

...