Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

  • The default sample is the initial sample.
  • By default, each sample is 10 MB in size or the entire dataset if it's smaller.  
  • If your source of data is a directory containing multiple files, the initial sample for the combined dataset is generated from the first set of rows in the first filename listed in the directory.
  • If you are wrangling a dataset with parameters, the initial sample that is loaded in the Transformer page is taken from the first matching dataset. Subsequent samples generated from the Transformer page are sampled across all datasets matched by parameter values.

 

 

 

  • When a source has been swapped, the previous initial sample becomes invalid, and a new initial sample is automatically generated for you.

...

  • When sampling from compressed data, the data is uncompressed and then expanded. As a result, the sample size reflects the uncompressed data.
  • Changes to preceding steps that alter the number of rows or columns in your dataset can invalidate the current sample, which means that the sample is no longer a valid representation of the state of the dataset in the recipe. In this case, 
    D s product
     automatically switches you back to the most recently collected sample that is currently valid. Details are below.

Parameterization of samples

Any parameters that are associated with your dataset can be applied to sampling:

  • Parameters: Subsequent samples generated from the Transformer page are sampled across all datasets matched by parameter values.

Choosing Samples

After you have collected multiple samples of multiple types on your dataset, you can choose the proper sample to use for your current task, based on:

...

  • Some advanced sampling options are available only with execution across a scan of the full dataset.
  • Undo/redo do not change the sample state, even if the sample becomes invalid. 
  • When a new sample is generated, any sort transforms that have been applied previously must be re-applied. Depending on the type of output, sort order may not be preserved. For more information, see Sort Transform.

  • Samples taken from a dataset with parameters are limited to a maximum of 50 files when executed on the Photon running environment. You can modify parameters as they apply to sampling jobs in Flow View. See Flow View Page Samples Panel.

Sample Invalidation

With each step that is added or modified to your recipe,

D s product
 checks to see if the current sample is valid. Samples are valid based on the state of your flow and recipe at the step when the sample was collected. If you add steps before the step where it was created, the currently active sample can be invalidated. For example, if you change the source of data, then the sample in the Transformer page no longer applies, and a new sample must be displayed.

...