If you are wrangling datasets that represent transactional or serialized data, you can append together slices of data to build a larger dataset for richer analysis. For example, you are cleansing log messages on a weekly basis. You can create separate datasets for each day's log messages and then bring them altogether into a single dataset for processing through a single recipe. This method works best for datasets that have identical or very similar structures.
Below, you can see two datasets of contact information. These simplified datasets track customer contact records.
Dataset01:
Name | Last Contact | |
---|---|---|
Jack Jones | jack@example.com | 06/15/2015 |
Tina Toms | tinat@example.com | 08/02/2015 |
Larry Lyons | larry.lyons@example.com | 03/22/2015 |
Dataset02:
Name | Last Contact Date | |
---|---|---|
Amy Abrams | 07/24/2015 | amy.abrams@example.com |
Tina Toms | 05/12/2015 | tinat@example.com |
Samantha Smith | 04/22/2015 | samantha@example.com |
Notes:
Steps:
Dataset01
) .union
.Dataset02
).Last_Contact_Date
field from Dataset02
is not included. You can:Last_Contact_Date
field in the left panel. The field is added as a separate field. However, it is not matched with the other contact date field from the original dataset.From the Match columns drop-down menu, select By Position. In this case, you can see that there are only three fields, but the order is mismatched.
Tip: When possible, you should try to rename or align columns in your datasets prior to building a union transform step. Otherwise, you might have to edit the columns after the union has been completed. To rename a column, click Rename from the column drop-down in the Transformer page. You can use the same drop-down to move a column. |
Email
column after the Last Contact
column in Dataset01
. Dataset02
. Select By Position from the Match columns drop-down menu. Your columns are matched.Dataset02
records have now been added to Dataset01
, which now contains all of the records from both datasets. Note that the record for Tina Toms appears twice in the appended dataset.
NOTE: Be sure to verify that the data type for each column is accurate. |