Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r089

...

A sample is a selection of rows from your dataset, which can be used as the basis for building the transformation steps in your recipe. The 

D s webapp
 automatically creates initial data initial data samples of your data whenever you create a new recipe for a dataset and enables you to create additional samples at any time using a variety of sampling techniques.

Initial Data

When you create a new recipe and load it in the Transformer page, the 

D s webapp
 displays the

...

initial data

...

sample of the dataset. The initial data consists of the first X rows of the datasets, where X is determined by the following factors:

  • The number of columns in the dataset
  • The amount of data in each cell
  • The maximum permitted size of each sample

...

  1. In the Transformer page, click the Eyedropper icon at the top of the page. 
  2. The Samples panel is displayed. 

    D caption
    Samples panel
  3. At the top of the panel, you can review the current sampleCurrent Sample.

    Tip

    Tip: In some cases, then the entire dataset is displayed in the data grid. Unless you wish to use a specific sampling technique to filter down your data, sampling may not be useful across the entire dataset.

  4.  Below the current sample, you can see the available sample types. To take a new random sample:
    1. Click the Random card.
    2. Depending on your product edition, you may be able to select Quick Scan or Full Scan.

      info

      NOTE: This option is not supported in

      D s product
      productss
      .

      1. Quick Scan creates your sample by making some assumptions about the data when it scans.
      2. Full Scan creates your sample by scanning across all rows of the dataset. This option can take awhile across a large dataset.
    3. Click Collect.
  5. The sampling job is queued for execution. When it completes, click Load Sample.
  6. The data grid is refreshed to display the rows gathered in the new random sample.

...

The above statement is best explained by example:

ActionSampling
1. Create a new recipe and open it in Transformer page.The initial sample is generated and displayed.
2. Add 3 steps to your recipe.The 3 new steps are applied to the initial sample in the browser's memory.
3. Generate a new random sample.

The random sample is generated. When you load the sample, it is displayed in the data grid.

4. Add 25 steps to your recipe.The 25 new steps are applied to the random sample in the browser's memory.
5. Select one of the first 3 steps of your recipe.The initial sample is loaded and displayed.
6. Insert a new step below the current one.Now, the first 4 steps are displayed using the initial sample.

Implications:

  • As you add steps to your recipe without resampling, your recipe and sample consume more memory in your browser.
  • When you perform complex multi-dataset operations, such as joins or unions, your recipe/sample combination consumes a lot more memory.
  • If you continue adding steps:
    • Performance in the browser can be impacted. Basic operations such as selection of data or new recipe steps can become slow to respond.
    • The browser can crash.

...

Samples can become invalid. If you your recipe steps change the number of rows or otherwise reshape your dataset using transformations such as pivot or join in the steps leading up to where you took the current sample, your existing sample may no longer be valid. 

...

For more information, see Overview of Sampling.For more information on best on best practices, see https://community.trifacta.com/s/article/Best-Practices-Managing-Samples-in-Complex-Flows.