Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

  1. Load the TestScores dataset into the Transformer page. It should already be split out into five separate columns.
  2. The three columns listed side by side are data that has been organized in a pivot table. To break down this data, you must unpivot the data, which breaks down the data into a key column (containing TestScore1TestScore2TestScore1) and a value column, which contains individual test scores.

    d-

    code
    showNotetrue
    unpivot col:

    trans
    Typestep
    p01NameColumns
    p01ValueTestScore1,TestScore2,TestScore3
    p02NameGroup size
    p02Value1
    SearchTermUnpivot columns

  3. Rename the generated column of test scores to TestScore.
  4. The numeric information in the key column values can be extracted using the following: 

    d-

    codeextract col: key on:

    trans

    hideNotetrue

    p03Value`{digit}`
    Typestep
    p01NameColumn to extract from
    p01Valuekey
    p02NameOption
    p02ValueCustom text or pattern
    p03NameText to extract
    SearchTermExtract text or pattern

  5. The key2 column contains just the numeric data now. Rename this column to TestNumber. You can drop the key column now.
  6. The dataset does not contain a primary key, which field containing a unique identifier for each row. The combination of last name, first name, and test number is a unique identifier for each row in the dataset:

    d-

    code
    hideNotetrue
    merge col:

    trans
    Typestep
    p01NameColumns
    p01ValueLastName,FirstName,TestNumber

    with:

    p02NameSeparator
    p02Value'-'
    SearchTermMerge columns

  7. Rename the new column to TestID. Typically, primary keys are listed as the first field in a dataset. You might want to move the column before the LastName column. 
  8. You may have noticed that the data is still organized by name (first and last) and test number, so that an individual's tests are scattered throughout the dataset. To reorganize the information, you can re-aggregate the data using the following:

    d-

    code
    hideNotetrue
    pivot value: SUM(TestScore) group:

    trans
    p03Value1
    Typestep
    p01NameRow labels
    p01ValueLastName,FirstName,TestNumber,TestID

    limit:1

    p02NameValues
    p02ValueSUM(TestScore)
    p03NameMax number of columns to create
    SearchTermPivot table

    Tip

    Tip: The above retains all instances of tests that have been taken. If you are only interested in the average test score, you can remove the TestNumber and TestID groupings and the change the SUM function to AVERAGE. In the results, you have one average for each test taker.

  9.  You may want to rename the aggregation column. Your final dataset should look like the following:

...

For more information on identifying outliers in this data, see Locate Outliers.

D s lang
 recipe:

The full 

D s lang
 recipe for this workflow is the following:

 

D code

unpivot col: TestScore1,TestScore2,TestScore3

D code
hideNotetrue

rename mapping: [value,'TestScore'] 

D code
hideNotetrue

extract col: key on: `{digit}`

D code
hideNotetrue

rename mapping:[key1,'TestNumber'] 

D code
hideNotetrue

drop col: key

D code
hideNotetrue

merge col: LastName,FirstName,TestNumber with: '-'

D code
hideNotetrue

rename mapping: [column1,'TestID']

D code
hideNotetrue

move col: TestID before: LastName 

D code
hideNotetrue

pivot value: SUM(TestScore) group: LastName,FirstName,TestNumber,TestID limit:1

...

hideNotetrue

...