Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...



Info

NOTE: Spark 2.3.0 jobs may fail on S3-based datasets due to a known incompatibility. For details, see https://github.com/apache/incubator-druid/issues/4456.

If you encounter this issue, please set spark.version to 2.1.0 in platform configuration. For more information, see Admin Settings Page.

Pre-requisites

...

ParameterDescription
aws.s3.consistencyTimeout

S3 does not guarantee that at any time the files that have been written to a directory will be consistent with the files available for reading. S3 does guarantee that eventually the files are in sync.

This guarantee is important for some platform jobs that write data to S3 and then immediately attempt to read from the written data.

This timeout defines how long the platform waits for this guarantee. If the timeout is exceeded, the job is failed. The default value is 120.

Depending on your environment, you may need to modify this value.

aws.s3.endpoint

This value should be the S3 endpoint DNS name value.

Info

NOTE: Do not include the protocol identifier.

Example value:

Code Block
s3.us-east-1.amazonaws.com

If your S3 deployment is either of the following:

  • located in a region that does not support the default endpoint, or
  • v4-only signature is enabled in the region

Then, you can specify this setting to point to the S3 endpoint for Java/Spark services. This value should be the S3 endpoint DNS name value.

For more information on this location, see https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region.

Testing

Restart services. See Start and Stop the Platform.

...