You can integrate data from other sources into your current dataset. Based on a key column that you identify in the lookup dataset, you can insert the corresponding values in other columns of the lookup dataset as new columns in your source dataset.
Tip: Column lookups are useful for adding reference data based on a column's values.
For example, your data contains the two-letter abbreviations for U.S. states, yet the target system is expecting the full name of each state. You need to replace the the
XY state state abbreviation with the full name of each state in each in each row.
Set up Your Lookup Data
Your data table should like the following:
Tip: You can download a version of this table, which also includes some timezone information. See Dict-TimezoneByState.csv.
Perform the Lookup
- In the Transformer page, click the drop-down on the column that contains your two-letter state abbreviations. Select Select Lookup ....
- In the Lookup Wizard, select the dataset to use for your lookup.
- For the lookup key, select the column in the dataset to use as the key value. In the above example, it is
- Click Click Execute Lookup.
- The lookup key value is used to locate all of the other column values in the reference dataset. These values are inserted in separate columns to the immediate right of the source column.
- You might need to drop some of the imported columns. In the above case, you might decide to drop the two-letter state identifier column, which has been replaced by the full state name column.
See See Lookup Wizard.
Example - Lookup for Timezones
The CSV linked above also contains timezone information for each state, which you can use to provide higher fidelity information on timestamps.
U.S. timezones are not consistently demarcated by state lines. Some states are split across multiple timezones. For more accurate representation of timezones, you should download and use a zipcode database, many of which are freely available online. This CSV is provided for demonstration purposes only.
- Complete steps 1-5 in the previous section.
- Delete all columns except the one containing timezone information. The The
Time Offsetscolumn identifies the predominant timezone in each state as an offset of the UTC timezone (Greenwich Mean Time).
- Move this column to the right of the column containing your timestamps.
NOTE: Depending on the requirements of your target system, you can use the