D toc
Excerpt |
---|
When you edit your dataset's recipe, the Transformer page is opened, where you begin your wrangling tasks on a sample of the dataset. Through this interface, you build your transformation recipe and see the results in real-time as applied to the sample. When you are satisfied with what you see, you can execute a job against the entire dataset. |
Goal
Your data transformation is complete when you have done the following:
...
Tip |
---|
Tip: Before you begin transforming, you should know the target schema that your transformed data must match. A schema is the set of columns and their data types, which define the constraints of your dataset. |
Tip |
Tip: If you want to match up against the target schema, you can import a dataset to serve as the target schema to which you are mapping. For more information on this advanced feature, see Overview of RapidTarget.You can import this target schema as a dataset and use it during recipe development to serve as a mapping for your transformations. |
Recommended Methods for Building Recipes
D s product | ||
---|---|---|
|
Select something. When you select elements of data in the Transformer page, you are prompted with a set of suggestions for steps that you can take on the selection or patterns matching the selection. You can select columns or one or more values within columns.
Toolbar andTip Tip: The easiest method for building recipes is to select items in the application. Over time, the application learns from your selections and prompts you with suggestions based on your previous use. For more information, see Overview of Predictive Transformation.
Toolbar and column menus: In the Transformer page, you can access pre-configured transformations through the Transformer toolbar or through the column context menus.
Tip Tip: Use the toolbar for global transformations across your dataset and the column menu for transformations on one or more selected columns.
- When a Transformer toolbar item is selected, the Transform Builder is pre-populated with settings and values to get you started. As needed you can modify the step to meet your needs. For more information, see Transformer Toolbar.
- The column menu contains menus contain the most common transformations for individual or multiple columns. Often, no additional configuration is required. For more information, see Column Menus.
- Select multiple columns. Continue selecting columns to be prompted with a different set of suggestions applicable to all of them.
Search and browse for transformations. Using the Search panel and the Transform Builder, you can rapidly assemble recipe steps through a simple, menu-driven interface. When you choose to add a step, you search for your preferred transformation in the Search panel. When one is selected, the transformation is Transform Builder is pre-populated from your selection in the Transform Builder for you. See Search PanelSearch panel.
Tip Tip: Use the Transform Builder for performing modifications to the transformation you selected from the Search panel or a suggestion card. See Transform Builder.
Sample
Loading very large datasets in
D s product |
---|
The default sample is the first set of rows of source data in the dataset, the number of which is determined by the platform. For smaller datasets, the entire dataset can be used as your sample. In the Transformer page, it's listed as Full Initial Data in the upper-left corner.
...
Tip |
---|
Tip: You should consider collecting a new sample if you have included a step to change the number of rows in your dataset or have otherwise permanently modified data (keep, delete, lookup, join, or pivot operations). If you subsequently remove the step that made the modification, the generated sample is no longer valid and is removed. This process limits unnecessary growth in data samples. |
On the right side of the screenTransformer page, you can launch a new sampling job on your dataset . For more information, see Samples Panelfrom the Samples panel. You may have to open it first.
Cleanse
Data cleansing tasks address issues in data quality, which can be broadly categorized as follows:
...
In the above image, some initial parsing steps have been applied to structure the data into tabular format, but these steps are not added as formal parts of the recipe. They are hidden from view in the recipe. By default, these steps are automatically added to the recipe when you permit the application to detect the structure of the imported data.
...
The data resulting from these initial transforms is displayed in the data grid. See Data Grid Panel.
- Your recipe is displayed in the Recipe panel on the right side. You might have to open this panel to see it. See Recipe Panel.
- When you select items in the data grid, suggestion cards are displayed for you to begin building transform steps. See Selection Details Panel.
- These suggestions can be modified to build more complex or subtle commands in the Transform Builder. See Transform Builder.
- Don't forget to use the Transformer toolbar, which pre-configures the Transform Builder with the configuration required for a useful transformation. See Transformer Toolbar.
- You can use the column context menu to apply changes to an individual column. See Column Menus.
Use a row to create headers:
...
D trans | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Delete unused columns:
Your data might contain columns that are not of use to you, so it's in your interest to remove them to simplify the dataset. To delete a column, click the caret next to the column's title and select Delete.
...
Tip |
---|
Tip: You can also delete multiple columns, including ranges of columns, too. See Remove Data. |
Check column data types:
When a dataset is imported,
D s product |
---|
Tip |
---|
Tip: Before you start performing transformations on your data based on mismatched values, you should check the data type for these columns to ensure that they are correct. For more information, see Supported Data Types. |
For more information, see Change Column Data Type.
Display only columns of interest:
...
In the Status bar at the bottom of the screen, click the Eye icon.
For more information, see Visible Columns Panel.
Review data quality:
Review data quality:
After you have removed unused data, you can examine the quality of data within each column just below the column title.
...
Tip |
---|
Tip: When you select values in the data quality bar, those values are highlighted in the sample rows, and suggestions are displayed at the bottom of the screen in the suggestion cards to address the selected rows. |
...
. |
Suggestion Cards:
Based on your selections and its knowledge of common data patterns,
D s product |
---|
...
- To accept this suggest, click Add.
- You can modify the step if needed. An example is provided later.
...
- .
For more background information, see Overview of Predictive Transformation.
Change data types:
If a column contains a high concentration of mismatched data (red), the column might have been identified as the wrong data type. For example, your dataset includes internal identifiers that are primarily numeric data (e.g. 10000022
) but have occasional alphabetical characters in some values (e.g. 1000002A
). The column for this data might be typed for integer values, when it should be treated as string values. For more information on the available types, see Supported Data Types.
Tip | |
---|---|
Tip: Where possible, you should set the data type for each column to the appropriate type.
|
- To change a column's data type, click the icon to the left of the column title.
- Select the new data type.
- Review the mismatched values for the column to verify that their count has dropped.
...
- .
Explore column details:
As needed, you can explore details about the column's data, including statistical information such as outliers. From the caret drop-down next to a column name, select Column Details.
...
Review histograms:
Just below a column's data quality bar, you can review a histogram of the values found in the column. In the following example, the data histogram on the left applies to the ZIP
column, while the one on the right applies the WEB_CHAT_ID
column.
D caption | ||
---|---|---|
| ||
Column data histogram |
When you mouse over the categories in the histogram, you can see the corresponding value, the count of instances in the sample's column, and the percentage of affected rows. In the left one, the bar with the greatest number of instances has been selected; the value 21202
occurs 506 times (21.28%) in the dataset. On the right, the darker shading indicates how rows with ZIP=21202
map to values in the WEB_CHAT_ID
column.
Tip |
---|
Tip: Similar to the data quality bar, you can click values in a data histogram to highlight the affected rows and to trigger a set of suggestions. In this manner, you can use the same data quality tools to apply even more fine-grained changes to individual values in a column. |
For a list of common tasks to cleanse your data, see Cleanse Tasks.
Assess Data Quality
You can create data quality rules to apply to the specifics of your dataset. For example, if your dataset includes square footage for commercial rental properties, you can create a data quality rule that tests the sqFt
field for values that are less than 0. These values are flagged in red in a data quality bar for the rule for easy review and triage.
Tip |
---|
Tip: Data quality rules are not transformation steps. They can be used to assess the current state of the data and are helpful to reference as you build your transformation steps to clean up the data. |
For more information, see Overview of Data Quality.
Modify
After you have performed initial cleansing of your data, you might need to perform modifications to the data to properly format it for the target system, specify the appropriate level of aggregation, or perform some other modification.
...
Review histograms:
Include Page Column Histogram Basics Column Histogram Basics
Assess Data Quality
D s ed | ||
---|---|---|
|
You can create data quality rules to apply to the specifics of your dataset. For example, if your dataset includes square footage for commercial rental properties, you can create a data quality rule that tests the sqFt
field for values that are less than 0. These values are flagged in red in a data quality bar for the rule for easy review and triage.
Tip |
---|
Tip: Data quality rules are not transformation steps. They can be used to assess the current state of the data and are helpful to reference as you build your transformation steps to clean up the data. |
For more information, see Overview of Data Quality.
Modify
After you have performed initial cleansing of your data, you might need to perform modifications to the data to properly format it for the target system, specify the appropriate level of aggregation, or perform some other modification.
In the following example, the improperly capitalized word BALTIMORE
has been selected, so that you can change it to its propercase spelling (Baltimore
). Those rows are highlighted in the row data, and a set of suggestions for how to fix has been provided in the cards at the bottom of the screen. See Selection Details Panelpanel.
D caption | ||
---|---|---|
| ||
Selecting values to modify |
...
Tip |
---|
Tip: When you select one of the suggestion cards, the implied changes are previewed in the Transformer page, so you can see the effects of the change. This previewing capability enables you to review and tweak your changes before they are formally applied. You can always remove a transform step if it is incorrect or even re-run the recipe to generate a corrected set of results, since source data is unchanged. For more information, see Transform Preview. |
In this case, select the Replace transformation. However, there are a couple of minor issues with the provided suggestion.
...
The step is added to the recipe and automatically applied to the data sample displayed in the Transformer page. For more information, see Transform Builder.See Cleanse TasksYou can continue to add new steps through the Transform Builder.
Enrichment
Before you deliver your data to the target system, you might need to enhance or augment the dataset with new columns or values from other datasets.
...
You can append a dataset of identical structure to your currently loaded one to expand the data volume. For example, you can string together daily log data to build weeks of log information . See Append Datasets. using the Union page.
Join datasets:
You can also join together two or more datasets based on a common set of values. For example, you are using raw sales data to build a sales commission dataset:
...
This commission dataset is created by performing an inner join between the sales transaction dataset and the employee dataset. In the Search panel, enter join
. See Join Datajoin
to join data.
Lookup values:
In some cases, you might need to include or replace values in your dataset with other columns from another dataset. For example, transactional data can reference product and customer by internal identifiers. You can create lookups into your master data set to retrieve user-friendly versions of customer and product IDs.
...
To perform a lookup for a column of values, click the caret drop-down next to the column title and select Lookup.... See Lookup Wizard.For a list of common workflows to enhance your dataset, see Enrichment Tasks.Lookup....
Sampling
The data that you see in the Transformer page is a sample of your entire dataset.
- If your dataset is small enough, the sample is the entire dataset.
- For larger datasets,
auto-generates an initial data sample from the first rows of your dataset.D s product
...
Tip | ||
---|---|---|
Tip: Sampling is an important concept in
|
...
. |
Profile
As part of the transformation process, you can generate and review visual profiles of individual columns and your entire dataset. These interactive profiles can be very helpful in identifying anomalies, outliers, and other issues with your data. For more information, see Profiling Basics.
D s also | ||||
---|---|---|---|---|
|