Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

Excerpt

To prevent overwhelming the client or significantly impacting performance, 

D s product
rtrue
 generates one or more samples of the data for display and manipulation in the client application. Since 
D s product
 supports a variety of clients and use cases, you can change the size of samples, the scope of the sample, and the method by which the sample is created. This section provides background information on how the product manages dataset sampling.

How Sampling Works

Initial

...

Data

When a dataset is first created, a background job begins to generate a sample using the first set of rows of the dataset. This initial sample is data sample is usually very quick to generate, so that you can get to work right away on your transformations.

...

  1. on a specified set of rows (firstrows)
  2. on a quick scan across the dataset 

    TipTip: Quick scan
    1. By default, Quick Scan samples are executed

    in the
    1. on the 

      D s photon
       running environment. 

    2. If 
      D s photon
       is not available or is disabled, the 
      D s webapp
       attempts to execute the Quick Scan sample on an available clustered running environment. 
    3. If the clustered running environment is not available or doesn't support Quick Scan sampling, then the Quick Scan sample job fails.
  3. on a full scan of the entire dataset 

    tip

    Tip:

    1. Full

    scan
    1. Scan samples are executed in the cluster running environment.

Sampling mechanics

When a non-initial sample is executed for a single dataset-recipe combination, the following steps occur:

...

Info

NOTE: When a flow is shared, its samples are shared with other users. However, if those users do not have access to the underlying files that back a sample, they do not have access to the sample and must create their own.

Changing sample sizes

If needed, you can change the size of samples that are loaded into the browser your current recipe. You may need to reduce these sizes if you are experiencing performance problems or memory issues in the browser. For more information, see Change Recipe Sample Size.

Important notes on sampling

  • Sampling jobs may incur costs. These costs may vary between
    D s photon
    and your clustered running environments, depending on type of sample and cost of job execution.

  • When sampling from compressed data, the data is uncompressed and then expanded. As a result, the sample size reflects the uncompressed data.
  • Changes to preceding steps that alter the number of rows or columns in your dataset can invalidate the current sample, which means that the sample is no longer a valid representation of the state of the dataset in the recipe. In this case, 
    D s product
     automatically switches you back to the most recently collected sample that is currently valid. Details are below.

...

Info

NOTE:

D s product
does not delete samples after they have been created. If you are concerned about data accumulation, you should configure periodic purges of the appropriate directories on the base storage layer. For more information, please contact your IT administrator.

For more information, see Sample Jobs Page.

Cancel Sample Jobs

Generating a sample can consume significant time, system resources, and in some deployments cost. As needed, you can cancel a sample job that is in progress in either of the following ways:

  • Locate the in-progress sampling job in the Samples panel. Click X.
  • Click the Jobs icon in the left nav bar. Select Sample jobs. For more information, see Sample Jobs Page.

Choosing Samples

After you have collected multiple samples of multiple types on your dataset, you can choose the proper sample to use for your current task, based on:

...