When you transform your data in the Transformer page, you are performing these transformations on a sample of the total dataset. As needed, you can generate new samples using a variety of algorithms to acquire other slices of your data. |
The initial data sample is collected from the initial rows of the dataset. Whenever you create a recipe and open the dataset in the Transformer page, automatically generates the initial sample.
10
files.Tip: On the Transformer page, this first sample is listed as Initial Data. For more information on how this special sampling type is generated, see Overview of Sampling. |
The initial sample allows you to get started immediately building your recipe steps. However, your recipe and dataset may require additional samples. For example:
Tip: You should utilize sampling as much as possible to improve the browser performance and to get good coverage of the samples across recipes. |
NOTE: Generation of a new sample is executed as a job. Quick scan jobs are executed through |
You can generate a new sample when:
If you are encountering low-memory conditions related to sampling or wish to improve the performance of the sampling process, you can adjust the size of the samples that are displayed in the browser for your current recipe. For more information, see Change Recipe Sample Size.
You can use the existing loaded sample, or you can collect a new sample to use.
Steps:
From the Samples panel, select the required type of sample. For more information, see Sample Types.
Quick: Creates a sample by partial scanning of the dataset and yields quicker results.
Tip: Quick scan samples are executed by default in the |
Full: Creates a sample by scanning the full dataset. This method takes a longer time depending on the size of the dataset.
Tip: Full scan samples are executed in the cluster running environment. |
Random samples can be generated from a quick or full scan of your dataset.
Tip: A random sample is a fast way to get another randomized slice of your dataset. Often, this can be a first sample to generate after loading a new dataset into the Transformer page. |
Steps:
The Filter-based sample is helpful when you want to filter the data based on specific values or formulas. The following example filters the required values in the Region
column for calculating discounts, and then generates a random sample from the matching rows only. For example, you may have a dataset with many values for Region
such as Atlantic, North East, West Coast and want to calculate discounts only for North East region, you can collect a Filter-based sample.
Steps:
Region == 'North East'
.North East
values for the Region
column. If your dataset has missing values or mismatched values, you can use Anomaly-based sample type to filter the missing values. The following example is based on the missing values in a Discount
column. When you apply the Anomaly-based sample, the sample displays only rows that have missing values for the Discount
column.
Steps:
Discount
.Discount
column.To cancel a sample collection, click the X next to the progress bar. The interrupted sample is listed as unavailable in the Collected samples panel.
You can create as many samples as required based on your dataset. All collected samples are available in the Collected samples panels, where you can review and load them as required.
Steps:
From the Collected samples panel, select the required sample from the Available tab. For more information, see "Collected Samples" below.
NOTE: Samples listed under the Unavailable tab are invalid for the current state of your recipe. You cannot select these samples for use. |
After you have created a sample, you cannot delete it through the application.
NOTE: |
NOTE: Samples are valid based on the state of your flow and recipe at the step where the sample was collected. |
Whenever you add or modify a step to the recipe, verifies if the current sample is valid. The current sample can become invalid if you add a new step before the step where the sample was created. For example, if you have created a sample in 30th step and if you add a new step that breaks the sample before the 30th step, then the sample becomes invalid.
After the sample becomes invalid, the Transformer page reverts to the recently collected sample that is valid.
NOTE: If the sample is reverted to an earlier sample, then more steps between when that sample was generated and your current location in the recipe are generated in the browser's memory. Browser performance may be impacted. |
NOTE: If you modify a SQL statement for an imported dataset, any samples based on the old SQL statement are invalidated. |
The collected samples store the details of your samples collected for your dataset. In the Samples panel, click See all collected samples link.
Collected samples |
The collected samples contain the following tabs:
You can click the sample name to view the sample details.
Sample details |
You can review and manage all of your samples like transformation jobs. For more information, see Sample Jobs Page.
For more information on best practices, troubleshooting, and browser crashes, see https://community.trifacta.com/s/article/Best-Practices-Managing-Samples-in-Complex-Flows.