- The default sample is the initial sample.
- By default, each sample is 10 MB in size or the entire dataset if it's smaller.
- If your source of data is a directory containing multiple files, the initial sample for the combined dataset is generated from the first set of rows in the first filename listed in the directory.
If you are wrangling a dataset with parameters, the initial sample that is loaded in the Transformer page is taken from the first matching dataset. Subsequent samples generated from the Transformer page are sampled across all datasets matched by parameter values.
- When a source has been swapped, the previous initial sample becomes invalid, and a new initial sample is automatically generated for you.
NOTE: When a flow is shared, its samples are shared with other users. However, if those users do not have access to the underlying files that back a sample, they do not have access to the sample and must create their own.
Tip: If you have added an expensive transformation step, such as a complex union or join, you can improve performance of the Transformer page by generating and using a new sample.
For more information on creating samples, see Samples Panel.
- When sampling from compressed data, the data is uncompressed and then expanded. As a result, the sample size reflects the uncompressed data.
- Changes to preceding steps that alter the number of rows or columns in your dataset can invalidate the current sample, which means that the sample is no longer a valid representation of the state of the dataset in the recipe. In this case,
automatically switches you back to the most recently collected sample that is currently valid. Details are below.
D s product
Parameterization of samples
Any parameters that are associated with your dataset can be applied to sampling:
- Parameters: Subsequent samples generated from the Transformer page are sampled across all datasets matched by parameter values.
Variables: You can apply override values to the defaults for your dataset's variables at sample execution time. In this manner, you can draw your samples from specific sources files within your dataset with parameters.
After you have collected multiple samples of multiple types on your dataset, you can choose the proper sample to use for your current task, based on:
- Some advanced sampling options are available only with execution across a scan of the full dataset.
- Undo/redo do not change the sample state, even if the sample becomes invalid.
When a new sample is generated, any
sorttransforms any Sort transformations that have been applied previously must be re-applied. Depending on the type of output, sort order may not be preserved. For more information, see Sort Transform.
Samples taken from a dataset with parameters are limited to a maximum of 50 files when executed on the Photon running environment. You can modify parameters as they apply to sampling jobs in Flow View. See Flow View Page Samples Panel.
With each step that is added or modified to your recipe,
|D s product|