When you transform your data in the Transformer page, you are performing these transformations on a sample of the total dataset. As needed, you can generate new samples using a variety of algorithms to acquire other slices of your data.

A sample is always collected from the initial rows of the dataset. Whenever you create a recipe and open the dataset in the Transformer page,   automatically generates the initial sample by loading the first 10 MB of your dataset.

Tip: On the Transformer page, this first sample is listed as Initial Data.

When to Take a New Sample

The initial sample allows you to get started immediately building your recipe steps. However, your recipe and dataset may require additional samples. For example:

Tip: You should utilize sampling as much as possible to improve the browser performance and to get good coverage of the samples across recipes.

NOTE: Generation of a new sample is executed as a job. Quick scan jobs are executed through on the , while Full scan jobs are executed on an available clustered running environment. Depending on your deployment, there may be costs associated with generating a sample.

You can generate a new sample when:

Limitations


Collect a New Sample

You can use the existing loaded sample, or you can collect a new sample to use.

Steps:

  1. In the Transformer page, click the Eyedropper icon at the top of the page. 
  2. From the Samples panel, select the required type of sample. For more information, see Sample Types.

  3. In the Collect new sample panel, select either Quick or Full scan.
    1. Quick: Creates a sample by partial scanning of the dataset and yields quicker results.

      Tip: Quick scan samples are executed by default in the running environment. If that environment is not available, the may attempt to run the Quick Scan job on an available clustered running environment.

    2. Full: Creates a sample by scanning the full dataset. This method takes a longer time depending on the size of the dataset.

      Tip: Full scan samples are executed in the cluster running environment.

  4. Click Collect to collect the same. When the sample is available, a status message is displayed in the Transformer page.
  5. Click Load Sample to start using the sample.

Cancel a Sample

To cancel a sample collection, click the X next to the progress bar. The interrupted sample is listed as unavailable in the Collected samples panel. 

Example - Random sample

Random samples can be generated from a quick or full scan of your dataset. 

Tip: A random sample is a fast way to get another randomized slice of your dataset. Often, this can be a first sample to generate after loading a new dataset into the Transformer page.

Steps:

  1. In the Transformer page, click the Eyedropper icon at the top of the page. 
  2. From the Samples panel, select Filter-based sample.
  3. In the Collect new sample panel, select the type of scan: Quick or Full.
  4. Click Collect
  5. When sample collection is complete, a confirmation message is displayed. Click Load sample
  6. The random sample is loaded into the Transformer page.

Example - Filter-based sample

The Filter-based sample is helpful when you want to filter the data based on specific values or formulas. The following example filters the required values in the Region column for calculating discounts, and then generates a random sample from the matching rows only. For example, you may have a dataset with many values for Region such as Atlantic, North East, West Coast and want to calculate discounts only for North East region, you can collect a Filter-based sample.

Steps:

  1. In the Transformer page, click the Eyedropper icon at the top of the page. 
  2. From the Samples panel, select Filter-based sample.
  3. In the Collect new sample panel, enter the following details:
    1. From the Scan column, select Quick. For more information, see "Collect a New Sample" above.
    2. In the Filter field, enter Region == 'North East'.
  4. Click Collect. A confirmation message is displayed.
  5. Click Load sample. The Filter-based sample is loaded with only the North East values for the Region column. 

Example - Anomaly-based sample

If your dataset has missing values or mismatched values, you can use Anomaly-based sample type to filter the missing values. The following example is based on the missing values in a Discount column. When you apply the Anomaly-based sample, the sample displays only rows that have missing values for the Discount column.

Steps:

  1. In the Transformer page, click the Eyedropper icon at the top of the page. 
  2. From the Samples panel, select Anomaly-based sample.
  3. In the Collect new sample panel, enter the following details:
    1. From the Scan column, select Quick. For more information, see "Collect a New Sample" above.
    2. Select the required column: Discount.
    3. From  the anomaly type, select Find missing values only.
  4. Click Collect. A confirmation message is displayed.
  5. Click Load sample. The Anomaly-based sample is loaded with the missing values for the Discount column.

Load a Sample

You can create as many samples as required based on your dataset. All collected samples are available in the Collected samples panels, where you can review and load them as required. 

Steps:

  1. In the Samples panel, click See all collected samples.
  2. From the Collected samples panel, select the required sample from the Available tab. For more information, see "Collected Samples" below.

    NOTE: Samples listed under the Unavailable tab are invalid for the current state of your recipe. You cannot select these samples for use.

  3. If you want to edit the sample name, click the Pencil icon against the sample.

Delete a Sample

After you have created a sample, you cannot delete it through the application.

NOTE: does not support deletion of samples after they have been created. For more information, contact your IT administrator. 

Sample Types

For more information, see Sample Types.

Invalid Samples

NOTE: Samples are valid based on the state of your flow and recipe at the step where the sample was collected.

Whenever you add or modify a step to the recipe,  verifies if the current sample is valid. The current sample can become invalid if you add a new step before the step where the sample was created. For example, if you have created a sample in 30th step and if you add a new step that breaks the sample before the 30th step, then the sample becomes invalid.

After the sample becomes invalid, the Transformer page reverts to the recently collected sample that is valid. 

NOTE: If the sample is reverted to an earlier sample, then more steps between when that sample was generated and your current location in the recipe are generated in the browser's memory. Browser performance may be impacted.


NOTE: If you modify a SQL statement for an imported dataset, any samples based on the old SQL statement are invalidated.

Collected Samples

The collected samples store the details of your samples collected for your dataset. In the Samples panel, click See all collected samples link. The collected samples contain the following tabs:

Review Sample Jobs

You can review and manage all of your samples like transformation jobs. 

Tip: As needed, you can cancel jobs in progress through the Samples panel or the Sample Jobs page.

For more information, see Sample Jobs Page.

Best Practices

For more information on best practices, troubleshooting, and browser crashes, see https://community.trifacta.com/s/article/Best-Practices-Managing-Samples-in-Complex-Flows.