For smaller datasets, the Transformer page displays the entire dataset. For larger ones, the source data is sampled for use in the Transformer page. Through the Samples panel, you can create new samples and select them for display in the Transformer page.

At the top of the Transformer page, the type of the current sample is displayed next to the dataset name. To open the Samples panel, click the current sample indicator:

Click the current sample button.

In the example above, you can see that the current sample is a Random sample.

Initial Data: The sample is taken from the first set of rows in the first file or table that is part of the dataset. 

To create a new sample, click Collect a new sample.

The Samples panel is displayed on the right side of the screen:

Tip: You can also open the Samples panel by clicking the Eyedropper icon at the top of the page.

To review all samples that you have created, see Sample Jobs Page.


Samples Panel

Current sample:

At the top of the panel, you can review the currently loaded sample. Each user has his own active sample on a dataset.

NOTE: When a new sample is generated, any Sort transformations that have been applied previously must be re-applied. Depending on the type of output, sort order may not be preserved.


New samples:

Below the current sample, you can review the available options for creating new samples. Each type of sample reflects a different method of collection.


Status bar:

At the bottom of the Transformer page, you can review the number of rows and columns and count of data types in the currently displayed sample.

NOTE: As you add transformation steps to your recipe, the values in the status bar change to reflect the current state of the loaded sample.

NOTE: Some operations, such as union, may change the row counts without invalidating the sample. If the operation increases the size of the dataset beyond the sample size limit enforced by the application, then a subset of those rows is displayed. This is a known issue.

Collect new sample

When a new sample is collected, it is gathered based on the current location in the recipe when the sample is gathered. So, if the recipe contains steps that join in other datasets, those joins are performed to bring together the data from which the sample is executed. 

Collect new sample panel

NOTE: Except for the initial data sample, all samples are generated based on the steps leading up to the location of the cursor in the recipe. If earlier steps are deleted or modified, the collected sample can be invalidated.

NOTE: When sampling from compressed data, the source is uncompressed, and a new sample of it is loaded into the data grid. As a result, the sample size you see in the grid corresponds to the uncompressed data.

Steps:

Collected samples

In the Collected samples panel, you can review the available and unavailable samples. If applicable, you can review the variable override values that were applied during the sampling.

To use one of the available samples, click Load. The sample is loaded in the data grid. For more information, see Generate a Sample.

NOTE: If you add recipe steps that change the number of rows in your dataset (or a few other edge case steps), some of your existing samples may no longer be valid. When you execute a join, union, or delete action or edit steps before this action, you may be prompted with the Change Recipe dialog, which includes the following message:

Your change will invalidate some of the currently available samples for this source. The invalid samples will be deactivated.

For more information on the types of transformations that can invalidate samples, see Reshaping Steps.