Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r082

...

  1. on a specified set of rows (firstrows)
  2. on a quick scan across the dataset 

    TipTip: Quick scan
    1. By default, Quick Scan samples are executed

    in the
    1. on the 

      D s photon
       running environment. 

    2. If 
      D s photon
       is not available or is disabled, the 
      D s webapp
       attempts to execute the Quick Scan sample on an available clustered running environment. 
    3. If the clustered running environment is not available or doesn't support Quick Scan sampling, then the Quick Scan sample job fails.
  3. on a full scan of the entire dataset 

    tip

    Tip:

    1. Full

    scan
    1. Scan samples are executed in the cluster running environment.

Sampling mechanics

When a non-initial sample is executed for a single dataset-recipe combination, the following steps occur:

...

Important notes on sampling

  • A new sampling job is executed in
    D s dataflow
    , which Sampling jobs may incur costs.
    If the source file is in Avro format, the These costs may vary between
    d-s-
    dataflow job samples from the entire file. As a result, additional processing costs may be incurred. This is a known issue.
    photon
    and your clustered running environments, depending on type of sample and cost of job execution.

  • When sampling from compressed data, the data is uncompressed and then expanded. As a result, the sample size reflects the uncompressed data.
  • Changes to preceding steps that alter the number of rows or columns in your dataset can invalidate the current sample, which means that the sample is no longer a valid representation of the state of the dataset in the recipe. In this case, 
    D s product
     automatically switches you back to the most recently collected sample that is currently valid. Details are below.

...

  • Some advanced sampling options are available only with execution across a scan of the full dataset.
  • Undo/redo do not change the sample state, even if the sample becomes invalid. 


  • Samples taken from a dataset with parameters are limited to a maximum of 50 files when executed on the
    D s photon
    running environment. You can modify parameters as they apply to sampling jobs. See Samples Panel
    .

Sample Invalidation

With each step that is added or modified to your recipe,

D s product
 checks to see if the current sample is valid. Samples are valid based on the state of your flow and recipe at the step when the sample was collected. If you add steps before the step where it was created, the currently active sample can be invalidated. For example, if you change the source of data, then the sample in the Transformer page no longer applies, and a new sample must be displayed.

...