TBE leverages pattern-based matching and predictive transformation to derive transformations. When you provide explicit mappings of input value to output, the mapping is passed through predictive transformation to determine the best possible matching pattern.
- For more information on patterns, see Overview of Pattern Matching.
- Predictive Transformation is a core component of Dataprep by Trifacta. Based upon user input, the platform provides one or more suggestions of ways in which to transform the data.
- In TBE, these suggestions are rendered as elements of the transformation in progress.
- For more information, see Overview of Predictive Transformation.
Tip: TBE simplifies the process of defining patterns to match all values in your source column. Since you know and can specify the exact desired output, you can leave the details of defining the pattern or patterns required to match input to output to the product.
Transformation by Example works well in the following use cases:
- You are just getting started with the product and would like to get productive quickly to transform your data into known outputs.
- Your data has groups of values, each of which needs transformation in a different way. In a single recipe step, you can perform these transformations across all groups.
Your data has special-case exceptions that must be transformed.
Tip: You can use this feature as a final cleanup for other transformations. If you have a transformation that handles 90% of the cases in a column, you can use this transformation to handle the remainder.
When a TBE step is added to your recipe, the number of individual changes can be many megabytes of data. Instead of storing these objects within the recipe definition, they are stored as a set of artifacts in the artifact storage database and referenced from the recipe.
- These artifacts exist outside the scope of the recipe file.
These artifacts must be stored in a Dataprep by Trifacta database for the step to be editable and exportable.
NOTE: If the artifact storage service is disabled, this feature is unusable.
- When a flow is exported, an
artifact.datafile is included as part of the export. This file must be imported with the flow definition, or the TBE step in the imported flow is broken. For more information, see Export Flow.
- TBE works best for inputs that are text-based data types (e.g. String, State, URL, etc.).
- Non-text inputs are treated as String type and may result in unexpected outputs (Integer, Decimal, etc.).
You cannot use multi-value inputs, such as Arrays or Objects, or use the feature to create them.
Tip: If you have Array or Object input columns, convert them to String type before using TBE.
- TBE bases its transformations on the currently displayed sample.
- Even if you accurately map all values in your sample, some other values in the full dataset may not be mapped by the transformation.
- You may need to take additional samples of other parts of the entire dataset to generate a more accurate transformation.
Arithmetic operations or other numeric functions are not supported.
- You cannot create multiple columns from a single TBE step.
This feature can be enabled and disabled through the Settings page in the Admin console.
Locate the following setting:
Set this value to
Column by Example
In column-by-example transformations, you create a new column from an existing one by mapping input to output values.
- Select the column to use as input data.
- Change the column to String data type, if needed.
- From the column menu, select Create column from examples. See Transformation by Example Page.
- Transform by example:
- Locate a row containing an example value to transform.
- In the corresponding row in the Preview column, you can enter in the new value to which the input is mapped.
- The transformation in development is updated to accurately capture the mapping you just performed. Additional rows in the output column may be accurately mapped, as well.
- Repeat the above steps until all values in the output column appear to be accurately mapped.
- When satisfied, add the transformation to your recipe.
- Change the data type of the target and the source columns, if needed.
- Remove the source column, if needed.
For more information, see Create Column by Example.
CBE for Datetime
Column-by-example also works on Datetime columns. When you use a Datetime column as your input, you specify the output values in the date/time format that you wish to use. That input value and all similarly formatted inputs should be converted to the output format. You can then specify additional example outputs for input values in a different format to standardize all of the values in the output column.
NOTE: For Datetime formatting to work properly, the input column must be specified as Datetime data type.
For string-based inputs, the following options in Wrangle may assist in performing the same functions that you are trying to do in TBE:
|Extract Transform||You can use the extract transform to retrieve sub-strings from a column and insert into a new column.|
Wrangle supports a variety of string manipulation functions, which can be used to gather data from a string.
This page has no comments.