...
- Load the TestScores dataset into the Transformer page. It should already be split out into five separate columns.
The three columns listed side by side are data that has been organized in a pivot table. To break down this data, you must unpivot the data, which breaks down the data into a key
column (containing TestScore1
, TestScore2
, TestScore1
) and a value
column, which contains individual test scores.
code |
unpivot col: trans |
---|
Type | step |
---|
p01Name | Columns |
---|
p01Value | TestScore1,TestScore2,TestScore3 |
---|
p02Name | Group size |
---|
p02Value | 1 |
---|
SearchTerm | Unpivot columns |
---|
|
- Rename the generated column of test scores to
TestScore
. The numeric information in the key
column values can be extracted using the following:
codeextract col: key on: |
p03Value | `{digit}` |
---|
Type | step |
---|
p01Name | Column to extract from |
---|
p01Value | key |
---|
p02Name | Option |
---|
p02Value | Custom text or pattern |
---|
p03Name | Text to extract |
---|
SearchTerm | Extract text or pattern |
---|
|
- The
key2
column contains just the numeric data now. Rename this column to TestNumber
. You can drop the key
column now. The dataset does not contain a primary key, which field containing a unique identifier for each row. The combination of last name, first name, and test number is a unique identifier for each row in the dataset:
code |
merge col: trans |
---|
Type | step |
---|
p01Name | Columns |
---|
p01Value | LastName,FirstName,TestNumber |
---|
|
with: | p02Name | Separator |
---|
p02Value | '-' |
---|
SearchTerm | Merge columns |
---|
|
- Rename the new column to
TestID
. Typically, primary keys are listed as the first field in a dataset. You might want to move the column before the LastName
column. You may have noticed that the data is still organized by name (first and last) and test number, so that an individual's tests are scattered throughout the dataset. To reorganize the information, you can re-aggregate the data using the following:
code |
pivot value: SUM(TestScore) group: trans |
---|
p03Value | 1 |
---|
Type | step |
---|
p01Name | Row labels |
---|
p01Value | LastName,FirstName,TestNumber,TestID |
---|
|
limit:1 | p02Name | Values |
---|
p02Value | SUM(TestScore) |
---|
p03Name | Max number of columns to create |
---|
SearchTerm | Pivot table |
---|
|
Tip |
---|
Tip: The above retains all instances of tests that have been taken. If you are only interested in the average test score, you can remove the TestNumber and TestID groupings and the change the SUM function to AVERAGE . In the results, you have one average for each test taker. |
- You may want to rename the aggregation column. Your final dataset should look like the following:
...
For more information on identifying outliers in this data, see Locate Outliers.
recipe:
The full
recipe for this workflow is the following:
D code |
---|
unpivot col: TestScore1,TestScore2,TestScore3 |
D code |
---|
|
rename mapping: [value,'TestScore'] |
D code |
---|
|
extract col: key on: `{digit}` |
D code |
---|
|
rename mapping:[key1,'TestNumber'] |
D code |
---|
|
merge col: LastName,FirstName,TestNumber with: '-' |
D code |
---|
|
rename mapping: [column1,'TestID'] |
D code |
---|
|
move col: TestID before: LastName |
D code |
---|
|
pivot value: SUM(TestScore) group: LastName,FirstName,TestNumber,TestID limit:1 |
...
...