NOTE: When a sample is executed from the Samples panel, it is launched based on the steps leading up to current location in the recipe steps. For example, if your recipe includes joining in other datasets, those steps are executed, and the sample is generated with dependencies on these other datasets. As a result, if you change your recipe steps that occur before the step where the sample was generated, you can invalidate your sample. More information is available below.
Depending on the type of sample you select, it may be generated based on one of the following methods, in increasing order of time to create:
- on a specified set of rows (firstrows)
- on a quick scan across the dataset
- on a full scan of the entire dataset
You can create a new sample at any time. When a sample is created, it is stored within your storage directory on the backend datastore. See User Profile Page.
NOTE: When a flow is shared, its samples are shared with other users. However, if those users do not have access to the underlying files that back a sample, they do not have access to the sample and must create their own.
For more information on creating samples, see Samples Panel.
Important notes on sampling
- When sampling from compressed data, the data is uncompressed and then expanded. As a result, the sample size reflects the uncompressed data.
- Changes to preceding steps that alter the number of rows or columns in your dataset can invalidate the current sample, which means that the sample is no longer a valid representation of the state of the dataset in the recipe. In this case,
automatically switches you back to the most recently collected sample that is currently valid. Details are below.
D s product
|D s product|
First rows samples
NOTE: The First rows sampling technique requires the Photon running environment. If the Photon running environment is disabled, please set
This sample is taken from the first set of rows in the transformed dataset based on the current cursor location in the recipe. The first N rows in the dataset are collected based on the recipe steps up to the configured sample size.
Tip: If you have chained together multiple recipes, all steps in all linked recipes must be run to provide visual updates. If you are experiencing performance problems related to this kind of updating, you can select a recipe in the middle of the chain of recipes and switch it off the initial sample to a different sample. When invoked, the recipes from the preceding datasets do not need to be executed, which can improve performance.
Random selection of a subset of rows in the dataset. These samples are comparatively fast to generate.You can apply quick scan or full scan to determine the scope of the sample.