...
For log data or other transactional data, the timestamp is typically a unique identifier.
Tip Tip: If you think you need a primary identifier for your dataset, you should try to identify it or create it before you drop delete potentially useful columns.
- Product information typically contains an SKU identifier. If that is not available, you may need brand, make, and model combinations, which can be created using the method described below.
...
Info |
---|
NOTE: Some transforms make original row order information unavailable.You cannot retrieve this information from relational sources. See Source Metadata References. |
Tip | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Tip: When you first load your dataset into the Transformer page, you should generate a column containing the original row information, such as the following:
This transform is useful to include after initial inference and structuring of each recipe for all of your datasets. |
...
origRowId | keyPrefix | primaryKey |
---|---|---|
1 | ORD000 | ORD0001 |
2 | ORD000 | ORD0001ORD0002 |
... | ORD000 | ... |
10 | ORD00 | ORD0010 |
... | ORD00 | ... |
99 | ORD00 | ORD0099 |
100 | ORD0 | ORD0100 |
...
- Change this column to be of String type. Select String from the data type drop-down for the column.
Create a column containing your prepended identifier and the proper number of zeroes. The following bit of logic generates a string with the proper number of zeroes depending on the length of the value in
origRowId
:D trans p03Value keyPrefix Type step p01Name Formula type p01Value Single row formula p02Name Formula p02Value IF(LEN(origRowId) > 3, 'ORD', IF(LEN(origRowId) > 2, 'ORD0',IF(LEN(origRowId) > 1, 'ORD00','ORD000'))) p03Name New column name SearchTerm New formula Info NOTE: The following works for up to 10,000 rows in the original dataset. You need to add additional
IF
clauses when your row counts exceed 10,000.Now, you can merge these columns together:
D trans Type step p01Name Columns p01Value keyPrefix,origRowId p02Name New column name p02Value primaryKey SearchTerm Merge columns You can now drop delete the prefix column:
D trans Type step p01Name Columns p01Value keyPrefix p02Name Action p02Value Delete selected columns SearchTerm Delete columns
...
- Load the dataset into the Transformer page.
- Identify the columns that together can uniquely identifier a row. In the TestScores-All example, these columns are the following:
- LastName
- FirstName
TestNumber
Info NOTE: It may be possible to set up a key using LastName and TestNumber, but that is not guaranteed. If the dataset changes over time, a working key based on these columns may become broken.
Use the
merge
transform to combine these columns together into a new column, such as the following:D trans p03Value TestID Type step p01Name Columns p01Value LastName,FirstName,TestNum p02Name Separator p02Value '-' p03Name New column name SearchTerm Merge columns The
with
clause identifies the delimiter between the merged column values.Values should look like the following:
TestID Smith-Joe-2 Doe-Jane-4 Jones-Jack-1 - In some cases, you may want to
drop
the delete the source columns for the primary key.