- Review your record counts. Before you specify the join, you should review your record counts and the uniqueness of your keys, which should provide an idea of the number of records you may see in the output. Note that the number of output records depends on the type of join and the matches between join keys.
- Review your join key values. If there are variations in the values in your join keys, you may end up with duplicate records in your joined dataset. Look for mismatched or missing values in your join keys, and correct if possible.
- Review the granularity of your data. If you bring together data at a lower fidelity than the source, you can end up with record matches that are not actually matching data. For example, if your timestamps are down-sampled from milliseconds to seconds as part of the join, you may have "matching" timestamps in seconds that were not matches at the millisecond level in the source data.
Step 1 - Select Dataset
In the Search panel, enter
- To make changes to the two join keys, mouse over the specified keys.
- To remove the two columns as join keys, click the X icon.
- To edit the keys to use and other key options, click the Pencil icon. See below.
To add more join keys, click Add.
NOTE: Be careful applying multiple join keys. Depending on the join type, this type of join can greatly expand the size of the generated data.
These following options are applied to the join key columns in both sources to attempt to find matches. After the join is executed, no data in either column is changed based on these selections.
Use a fuzzy matching algorithm for key value matching.
Fuzzy matching uses the doublemetaphone algorithm for matching strings (keys). Both primary encodings of each key value must match. See DOUBLEMETAPHONEEQUALS Function.
|Ignore case||Ignore case differences between the join key values for matching purposes.|
|Ignore special characters||Ignore all characters that are not alphanumeric, accented Latin characters, or whitespace, prior to testing for a match.|
|Ignore whitespace||Ignore all whitespace characters, including spaces, tabs, carriage returns, and newlines.|
Join Key Summary:
You can use these metrics to identify the likelihood of accurate matching between the join keys and the row count generated in the output.
- Include all columns from Current data: Dynamic updates always include the latest data from your current dataset.
- Include all columns from Joined-In data: Dynamic updates always include the latest data from the dataset that you are joining in.
NOTE: After you add your join to the recipe, if the data grid is empty, then the keys that you specified in the join may not have a match in the currently selected sample. You should revisit the keys used in your join. If the join still generates an empty grid on the current sample, you should collect a new sample. See Samples Panel.
Tip: If you must freeze the data in the dataset that you are joining in, you should create a copy of the dataset as a snapshot and join in the copy. See Dataset Details Page.
To join in the copy, edit the join and change the source that is being joined. See Fix Dependency Issues.
To add the specified join to your recipe, click Add to Recipe.