After you have created or selected When you edit your dataset's recipe, the Transformer page is opened, where you begin your wrangling tasks on a sample of the dataset. Through this interface, you build your transformation recipe and see the results in real-time as applied to the sample. When you are satisfied with what you see, you can execute a job against the entire dataset.
- Cleansed your data of invalid, missing, or inaccurate values
- Enhanced your dataset as needed with data from other datasets
- Modified your dataset to constrain its values to meet the target schemaEnhanced your dataset as needed with data from other
- datasetsExecuted job against the entire dataset
- Exported the results from your dataset and recipe for use in downstream systems
Tip: Before you begin transforming, you should know the target schema that your transformed data must match. A schema is the set of columns and their data types, which define the constraints of your dataset.
Tip: If you want to match up against the target schema, you can import a dataset to serve as the target schema to which you are mapping. For more information on this advanced feature, see Overview of RapidTarget.
Recommended Methods for Building Recipes
Select something. When you make a selection select elements of data in the Transformer page, you are prompted with a set of suggestions for steps that you can take on the selection or patterns matching the selection. You can select columns or one or more values within columns.
Tip: The easiest method for building recipes is to select items in the application. Over time, the application learns from your selections and prompts you with suggestions based on your previous use. For more information, see Overview of Predictive Transformation.
Toolbar and column menus: In the Transformer page, you can access pre-configured transformations through the Transformer toolbar or through the context menus for individual columns.
Tip: Use the toolbar for global transformations across your dataset and the column menu for transformations on an individual column.
- When a toolbar item is selected, the Transform Builder is pre-populated with settings and values to get you started. As needed you can modify the step to meet your needs. For more information, see Transformer Toolbar.
- If you need to apply a transformation to an individual column, use the menu items in the column menu. This menu contains the most common column transformations. Often, no additional configuration is required. For more information, see Column Menus.
Search and browse for transformations. Using the Search panel and the Transform Builder, you can rapidly assemble recipe steps through a simple, menu-driven interface. When you choose to add a step, you search for your preferred transformation in the Search panel. When one is selected, the transformation is pre-populated in the Transform Builder with parameter values based on your selections in the datafor you. See Search Panel.
Tip: Use the Transform Builder for performing modifications to the transform transformation you selected from the Search panel or a suggestion card. See Transform Builder.
The default sample is the first set of rows of source data in the dataset, the number of which is determined by the platform. For smaller datasets, the entire dataset can be used as your sample. In the Transformer page, it's listed as Full Data in the upper-left corner.
In some cases, the default sample might be inadequate or of the wrong type. To generate a new sample, click the name of the sample in the upper-left corner.
NOTE: Collecting new samples requires system resources and storage. You can collect In some environments, collecting samples incurs monetary cost.
Tip: You should consider collecting a new sample if you have included a step to change the number of rows in your dataset or have otherwise permanently modified data (keep, delete, lookup, join, or pivot operations). If you subsequently remove the step that made the modification, the generated sample is no longer valid and is removed. This process limits unnecessary growth in data samples.
- Consistency. Values that describe the same thing should agree with each other. Numeric For example, numeric values should have the same precision. String values should be consistently structured to mean the same thing.
- Validity. Values should be constrained to the requirements of each field's data type. For example, a DateOfSale field should be a valid date.
- Reliability. Values in the same field in different records should mean the same thing. For example, the value
15in the Temperature field of two different records should not mean Centigrade in one record and Fahrenheit in the other record.
In the above image, some initial parsing steps have been applied to structure the data in tabular, but these steps are not added as formal parts of the recipe. They are hidden from view in the recipe.
- By default, these steps are automatically added to the recipe when you permit the application to detect the structure of the imported data.
- For more information, see Initial Parsing Steps.
- Your recipe is displayed in the Recipe panel on the right side. You might have to open this panel to see it. See Recipe Panel.
- When you select items in the data grid, suggestion cards are displayed for you to begin building transform steps. See Suggestion Cards Panel.
- These suggestions can be modified to build more complex or subtle commands in the Transform Builder. See Transform Builder.
- Don't forget to use the Transformer toolbar, which pre-configures the Transform Builder with the configuration required for a useful transformation. See Transformer Toolbar.
- You can use the column context menu to apply changes to an individual column. See Column Menus.
Create Use a header row to create headers:
In most cases, the names of your columns are inferred from the first row of the data in the dataset.
NOTE: Some operations, such as unions and joins, may cause row information to be lost, which means that the following transformation is not available.
If you need to specify a different row, please complete the following:
Click the Search icon in the menu bar.
In the Search panel textbox, type:
- The transformation is displayed in the Transform Builder. Specify the following properties:
D trans Type step p01Name Option p01Value Use row as header p02Name Row p02Value 1 SearchTerm Rename columns
- If you need to specify a different row to use, you can specify a specific row number to use in the Row textbox.
- To add this or any transform in development to your recipe, click Add. This button is disabled if the step is invalid.
Generate row numbersmetadata:
On the left side of the data grid, you might notice a set of black dots. If you hover over one of these, the original row number from the source data is listed. Since the data transformation process can change the number of rows or their order, you might want to retain the original order of the rows.
Tip: Some operations, such as unions and joins, can invalidate source row number information. To capture this data into your dataset, it's best to add this transformation early in your recipe.
To retain the original row numbers in a column called,
rowId, please complete the following:
- In the Transformer toolbar above the data grid, select the Function menu.
- From the menu, select Metadata > SOURCEROWNUMBER.
- In the Transform Builder, enter
rowIdin the New column name textbox.
- Click Add.
You can use a similar transformation to generate the full path and filename to file-based sources:
For more information, see Insert Metadata.
Drop unused columns:
Your data might contain columns that are not of use to you, so it's in your interest to remove them to simplify the dataset. To drop a column, click the caret next to the column's title and select Delete.
Tip: Before you start performing transformations on your data based on mismatched values, you should check the data type for these columns to ensure that they are correct. For more information, see Supported Data Types.
For more information, see Change Column Data Type.
Display only columns of interest:
You can choose which columns you want to display in the data grid, which can be useful to narrow your focus to problematic areas.
In the Transformer toolbar Status bar at the top bottom of the screen, click the Column View icon.
For more information, see Column Browser Visible Columns Panel.
Review data quality:
After you have removed unused data, you can examine the quality of data within each column just below the column title.
Column header with data quality bar
The horizontal bar, known, as the data quality bar, identifies the quality of the data in the column by the following colors:
|green||These values are valid for the specified column data type.|
|red||These values do not match those of the specified column type.|
|black||There are no values for the column in these rows.|
For more information, see Data Quality Bars.
Based on your selections and its knowledge of common data patterns,
|D s product|
Tip: Where possible, you should try to create your transforms by selecting data and then selecting the appropriate suggestion card. In some cases, you might need to modify the details of the recipe.
For more background information, see Overview of Predictive Transformation.
In the following example, the missing values in the
SUBSCRIBER_AGE column have been selected, and a set of suggestion cards is displayed.
- To accept this suggest, click Add.
- You can modify the step if needed. An example is provided later.
For more information, see Explore Suggestions.
For more background information, see Overview of Predictive Transformation.
Change data types:
If a column contains a high concentration of mismatched data (red), the column might have been identified as the wrong data type. For example, your dataset includes internal identifiers that are primarily numeric data (e.g.
10000022) but have occasional alphabetical characters in some values (e.g.
1000002A). The column for this data might be typed for integer values, when it should be treated as string values. For more information on the available types, see Supported Data Types.
Tip: Where possible, you should set the data type for each column to the appropriate type.
- To change a column's data type, click the icon to the left of the column title.
- Select the new data type.
- Review the mismatched values for the column to verify that their count has dropped.
For more information, see
Explore column details:
As needed, you can explore details about the column's data, including statistical information such as outliers.
- From the caret drop-down next to a column name, select Column Details. See Column Details Panel.
- For more information on column stats, see Column Statistics Reference.
After you have performed initial cleansing of your data, you might need to perform modifications to the data to properly format it for the target system, specify the appropriate level of aggregation, or perform some other modification. When you select data, suggested transformations are presented to you as suggestion cards. Select one, or create your own transformation as needed.
Tip: Modification steps are often specific to the downstream use-case for the data. If your source dataset needs to satisfy multiple downstream uses, you might need to make modifications to satisfy each use case, which are in conflict with each other. It might be easier to cleanse first, create a reference for the recipe object, and then import the reference dataset in each flow for further modification. For more information, see Flow View Page.
Tip: When you select one of the suggestion cards, the implied changes are previewed in the Transformer page, so you can see the effects of the change. This previewing capability enables you to review and tweak your changes before they are formally applied. You can always remove a transform step if it is incorrect or even re-run the recipe to generate a corrected set of results, since source data is unchanged. For more information, see Transform Preview.
Tip: This process of selecting data in a column's data quality bar or histogram of values is the recommended method for identifying problematic data in your dataset. You can apply this method to mismatched (red), missing (black) values, or data outliers across all of the columns of your dataset.
In this case, select the
replace transformReplace transformation. However, there are a couple of minor issues with the provided suggestion.
The step is added to the recipe and automatically applied to the data sample displayed in the Transformer page. For more information, see Transform Builder.
See Cleanse Tasks.
Before you deliver your data to the target system, you might need to enhance or augment the dataset with new columns or values from other datasets. These multi-dataset operations can greatly expand the capabilities of your wrangling workflows.
You can append a dataset of identical structure to your currently loaded one to expand the data volume. For example, you can string together daily log data to build weeks of log information. See Union Page Append Datasets.
You can also join together two or more datasets based on a common set of values. For example, you are using raw sales data to build a sales commission dataset:
In some cases, you might need to include or replace values in your dataset with other columns from another dataset. For example, transactional data can reference product and customer by internal identifiers. You can create lookups into your master data set to retrieve user-friendly versions of customer and product IDs.