Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

  • For log data or other transactional data, the timestamp is typically a unique identifier.

    Tip

    Tip: If you think you need a primary identifier for your dataset, you should try to identify it or create it before you drop delete potentially useful columns.

  • Product information typically contains an SKU identifier. If that is not available, you may need brand, make, and model combinations, which can be created using the method described below.

...

Info

NOTE: Some transforms make original row order information unavailable.You cannot retrieve this information from relational sources.

See Source Metadata References.


Tip

Tip: When you first load your dataset into the Transformer page, you should generate a column containing the original row information, such as the following:

D trans
p03ValueorigRowId
Typestep
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueSOURCEROWNUMBER()
p03NameNew column name
SearchTermNew formula

This transform is useful to include after initial inference and structuring of each recipe for all of your datasets.

...

origRowIdkeyPrefixprimaryKey
1ORD000ORD0001
2ORD000ORD0001ORD0002
...ORD000...
10ORD00ORD0010
...ORD00...
99ORD00ORD0099
100ORD0ORD0100

...

  1. Change this column to be of String type. Select String from the data type drop-down for the column.
  2. Create a column containing your prepended identifier and the proper number of zeroes. The following bit of logic generates a string with the proper number of zeroes depending on the length of the value in origRowId:

    D trans
    p03ValuekeyPrefix
    Typestep
    p01NameFormula type
    p01ValueSingle row formula
    p02NameFormula
    p02ValueIF(LEN(origRowId) > 3, 'ORD', IF(LEN(origRowId) > 2, 'ORD0',IF(LEN(origRowId) > 1, 'ORD00','ORD000')))
    p03NameNew column name
    SearchTermNew formula

    Info

    NOTE: The following works for up to 10,000 rows in the original dataset. You need to add additional IF clauses when your row counts exceed 10,000.

  3. Now, you can merge these columns together:

    D trans
    Typestep
    p01NameColumns
    p01ValuekeyPrefix,origRowId
    p02NameNew column name
    p02ValueprimaryKey
    SearchTermMerge columns

  4. You can now drop delete the prefix column:

    D trans
    Typestep
    p01NameColumns
    p01ValuekeyPrefix
    p02NameAction
    p02ValueDelete selected columns
    SearchTermDelete columns

...

  1. Load the dataset into the Transformer page.
  2. Identify the columns that together can uniquely identifier a row. In the TestScores-All example, these columns are the following:
    1. LastName
    2. FirstName
    3. TestNumber

      Info

      NOTE: It may be possible to set up a key using LastName and TestNumber, but that is not guaranteed. If the dataset changes over time, a working key based on these columns may become broken.

  3. Use the merge transform to combine these columns together into a new column, such as the following:

    D trans
    p03ValueTestID
    Typestep
    p01NameColumns
    p01ValueLastName,FirstName,TestNum
    p02NameSeparator
    p02Value'-'
    p03NameNew column name
    SearchTermMerge columns

    The with clause identifies the delimiter between the merged column values.

  4. Values should look like the following:

    TestID
    Smith-Joe-2
    Doe-Jane-4
    Jones-Jack-1
  5. In some cases, you may want to drop the delete the source columns for the primary key.