When you transform your data in the Transformer page, you are performing these transformations on a sample of the total dataset. As needed, you can generate new samples using a variety of algorithms to acquire other slices of your data.
A sample is always collected from the initial rows of the dataset. Whenever you create a recipe and open the dataset in the Transformer page, automatically generates the initial sample by loading the first 10 MB of your dataset.
Tip: On the Transformer page, this first sample is listed as Initial Data.
The initial sample allows you to get started immediately building your recipe steps. However, your recipe and dataset may require additional samples. For example:
Tip: You should utilize sampling as much as possible to improve the browser performance and to get good coverage of the samples across recipes.
NOTE: Generation of a new sample is executed as a job. Quick scan jobs are executed through on the , while Full scan jobs are executed on an available clustered running environment. Depending on your deployment, there may be costs associated with generating a sample.
You can generate a new sample when:
You can use the existing loaded sample, or you can collect a new sample to use.
From the Samples panel, select the required type of sample. For more information, see Sample Types.
Quick: Creates a sample by partial scanning of the dataset and yields quicker results.
Tip: Quick scan samples are executed by default in the running environment. If that environment is not available, the may attempt to run the Quick Scan job on an available clustered running environment.
Full: Creates a sample by scanning the full dataset. This method takes a longer time depending on the size of the dataset.
Tip: Full scan samples are executed in the cluster running environment.
To cancel a sample collection, click the X next to the progress bar. The interrupted sample is listed as unavailable in the Collected samples panel.
Random samples can be generated from a quick or full scan of your dataset.
Tip: A random sample is a fast way to get another randomized slice of your dataset. Often, this can be a first sample to generate after loading a new dataset into the Transformer page.
The Filter-based sample is helpful when you want to filter the data based on specific values or formulas. The following example filters the required values in the
Region column for calculating discounts, and then generates a random sample from the matching rows only. For example, you may have a dataset with many values for
Region such as Atlantic, North East, West Coast and want to calculate discounts only for North East region, you can collect a Filter-based sample.
Region == 'North East'.
North Eastvalues for the
If your dataset has missing values or mismatched values, you can use Anomaly-based sample type to filter the missing values. The following example is based on the missing values in a
Discount column. When you apply the Anomaly-based sample, the sample displays only rows that have missing values for the
You can create as many samples as required based on your dataset. All collected samples are available in the Collected samples panels, where you can review and load them as required.
From the Collected samples panel, select the required sample from the Available tab. For more information, see "Collected Samples" below.
NOTE: Samples listed under the Unavailable tab are invalid for the current state of your recipe. You cannot select these samples for use.
After you have created a sample, you cannot delete it through the application.
NOTE: does not support deletion of samples after they have been created. For more information, contact your IT administrator.
For more information, see Sample Types.
NOTE: Samples are valid based on the state of your flow and recipe at the step where the sample was collected.
Whenever you add or modify a step to the recipe, verifies if the current sample is valid. The current sample can become invalid if you add a new step before the step where the sample was created. For example, if you have created a sample in 30th step and if you add a new step that breaks the sample before the 30th step, then the sample becomes invalid.
After the sample becomes invalid, the Transformer page reverts to the recently collected sample that is valid.
NOTE: If the sample is reverted to an earlier sample, then more steps between when that sample was generated and your current location in the recipe are generated in the browser's memory. Browser performance may be impacted.
NOTE: If you modify a SQL statement for an imported dataset, any samples based on the old SQL statement are invalidated.
The collected samples store the details of your samples collected for your dataset. In the Samples panel, click See all collected samples link. The collected samples contain the following tabs:
You can review and manage all of your samples like transformation jobs.
Tip: As needed, you can cancel jobs in progress through the Samples panel or the Sample Jobs page.
For more information on best practices, troubleshooting, and browser crashes, see https://community.trifacta.com/s/article/Best-Practices-Managing-Samples-in-Complex-Flows.