Page tree



Contents:

The cloud-based version of Trifacta Wrangler is now available! Read all about it, and register for your free account.

Contents:


NOTE: This function has been superseded by the $sourcerownumber reference. While this function is still usable in the product, it is likely to be deprecated in a future release. Please use $sourcerownumber instead. For more information, see Source Metadata References.

Returns the row number of the current row as it appeared in the original source dataset before any steps had been applied.

The following transforms might make original row information invalid or otherwise unavailable. In these cases, the function returns null values:

  • pivot
  • flatten
  • join
  • lookup
  • union
  • unnest
  • unpivot

NOTE: If the dataset is sourced from multiple files, a predictable original source row number cannot be guaranteed, and null values are returned.

Tip: If the source row information is still available, you can hover over the left side of a row in the data grid to see the source row number in the original source data.


Basic Usage

Example:



Output: Returns the source row number for each row as it appeared in the original data.

Sort Example:

Transformation Name Sort rows
Parameter: Sort by sourcerownumber()

Output: Rows in the dataset are re-sorted according to the original order in the dataset.

Delete Example:


Transformation Name Filter rows
Parameter: Condition Custom formula
Parameter: Type of formula Custom single
Parameter: Condition sourcerownumber() > 101
Parameter: Action Delete matching rows

Output: Deletes the rows in the dataset that were after row #101 in the original source data.

Syntax

There are no arguments for this function.

Examples

Example - Header from row that is not the first one

Source:

You have imported the following racer data on heat times from a CSV file. When loaded in the Transformer page, it looks like the following:

(rowId)column2column3column4column5
1RacerHeat 1Heat 2Heat 3
2Racer X37.2238.2237.61
3Racer Y41.33DQ38.04
4Racer Z39.2739.0438.85

In the above, the (rowId) column references the row numbers displayed in the data grid; it is not part of the dataset. This information is available when you hover over the black dot on the left side of the screen.

Transformation:

You have examined the best performance in each heat according to the sample. You then notice that the data contains headers, but you forget how it was originally sorted. The data now looks like the following:

(rowId)column2column3column4column5
1Racer Y41.33DQ38.04
2RacerHeat 1Heat 2Heat 3
3Racer X37.2238.2237.61
4Racer Z39.2739.0438.85

You can use the following transformation to use the third row as your header for each column:

Transformation Name Rename column with row(s)
Parameter: Option Use row(s) as column names
Parameter: Type Use a single row to name columns
Parameter: Row number 3


Results:

After you have applied the last header transform, your data should look like the following:

(rowId)RacerHeat_1Heat_2Heat_3
3Racer Y41.33DQ38.04
2Racer X37.2238.2237.61
4Racer Z39.2739.0438.85

You can sort by the Racer column in ascending order to return to the original sort order.

Example - Using sourcerownumber to create unique row identifiers

The following example demonstrates how to unpack nested data. As part of this example, the SOURCEROWNUMBER function is used as part of a method to create unique row identifiers.

Source:

You have the following data on student test scores. Scores on individual scores are stored in the Scores array, and you need to be able to track each test on a uniquely identifiable row. This example has two goals:

  1. One row for each student test
  2. Unique identifier for each student-score combination

 

LastNameFirstNameScores
AdamsAllen[81,87,83,79]
BurnsBonnie[98,94,92,85]
CannonCharles[88,81,85,78]

Transformation:

When the data is imported from CSV format, you must add a header transform and remove the quotes from the Scores column:

Transformation Name Rename column with row(s)
Parameter: Option Use row(s) as column names
Parameter: Type Use a single row to name columns
Parameter: Row number 1

Transformation Name Replace text or pattern
Parameter: Column colScores
Parameter: Find '\"'
Parameter: Replace with ''
Parameter: Match all occurrences true

Validate test date: To begin, you might want to check to see if you have the proper number of test scores for each student. You can use the following transform to calculate the difference between the expected number of elements in the Scores array (4) and the actual number:

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula (4 - arraylen(Scores))
Parameter: New column name 'numMissingTests'

When the transform is previewed, you can see in the sample dataset that all tests are included. You might or might not want to include this column in the final dataset, as you might identify missing tests when the recipe is run at scale.

Unique row identifier: The Scores array must be broken out into individual rows for each test. However, there is no unique identifier for the row to track individual tests. In theory, you could use the combination of LastName-FirstName-Scores values to do so, but if a student recorded the same score twice, your dataset has duplicate rows. In the following transform, you create a parallel array called Tests, which contains an index array for the number of values in the Scores column. Index values start at 0:

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula range(0,arraylen(Scores))
Parameter: New column name 'Tests'

Also, we will want to create an identifier for the source row using the sourcerownumber function:

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula sourcerownumber()
Parameter: New column name 'orderIndex'

One row for each student test: Your data should look like the following:

LastNameFirstNameScoresTestsorderIndex
AdamsAllen[81,87,83,79][0,1,2,3]2
BurnsBonnie[98,94,92,85][0,1,2,3]3
CannonCharles[88,81,85,78][0,1,2,3]4

Now, you want to bring together the Tests and Scores arrays into a single nested array using the arrayzip function:

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula arrayzip([Tests,Scores])

Your dataset has been changed:

LastNameFirstNameScoresTestsorderIndexcolumn1
AdamsAllen[81,87,83,79][0,1,2,3]2[[0,81],[1,87],[2,83],[3,79]]
AdamsBonnie[98,94,92,85][0,1,2,3]3[[0,98],[1,94],[2,92],[3,85]]
CannonCharles[88,81,85,78][0,1,2,3]4[[0,88],[1,81],[2,85],[3,78]]

Use the following to unpack the nested array:

Transformation Name Expand arrays to rows
Parameter: Column column1

Each test-score combination is now broken out into a separate row. The nested Test-Score combinations must be broken out into separate columns using the following:

Transformation Name Unnest Objects into columns
Parameter: Column column1
Parameter: Paths to elements '[0]','[1]'

After you delete column1, which is no longer needed you should rename the two generated columns:

Transformation Name Rename columns
Parameter: Option Manual rename
Parameter: Column column_0
Parameter: New column name 'TestNum'

Transformation Name Rename columns
Parameter: Option Manual rename
Parameter: Column column_1
Parameter: New column name 'TestScore'

Unique row identifier: You can do one more step to create unique test identifiers, which identify the specific test for each student. The following uses the original row identifier OrderIndex as an identifier for the student and the TestNumber value to create the TestId column value:

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula (orderIndex * 10) + TestNum
Parameter: New column name 'TestId'

The above are integer values. To make your identifiers look prettier, you might add the following:

Transformation Name Merge columns
Parameter: Columns 'TestId00','TestId'

Extending: You might want to generate some summary statistical information on this dataset. For example, you might be interested in calculating each student's average test score. This step requires figuring out how to properly group the test values. In this case, you cannot group by the LastName value, and when executed at scale, there might be collisions between first names when this recipe is run at scale. So, you might need to create a kind of primary key using the following:

Transformation Name Merge columns
Parameter: Columns 'LastName','FirstName'
Parameter: Separator '-'
Parameter: New column name 'studentId'

You can now use this as a grouping parameter for your calculation:

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula average(TestScore)
Parameter: Group rows by studentId
Parameter: New column name 'avg_TestScore'

Results:

After you delete unnecessary columns and move your columns around, the dataset should look like the following:

TestIdLastNameFirstNameTestNumTestScorestudentIdavg_TestScore
TestId0021AdamsAllen081Adams-Allen82.5
TestId0022AdamsAllen187Adams-Allen82.5
TestId0023AdamsAllen283Adams-Allen82.5
TestId0024AdamsAllen379Adams-Allen82.5
TestId0031AdamsBonnie098Adams-Bonnie92.25
TestId0032AdamsBonnie194Adams-Bonnie92.25
TestId0033AdamsBonnie292Adams-Bonnie92.25
TestId0034AdamsBonnie385Adams-Bonnie92.25
TestId0041CannonChris088Cannon-Chris83
TestId0042CannonChris181Cannon-Chris83
TestId0043CannonChris285Cannon-Chris83
TestId0044CannonChris378Cannon-Chris83

Example - Delete rows based on source row numbers

Source:

Your dataset is the following set of orders.

CustIdFirstNameLastNameCityStateLastOrder
1001SkipJonesSan FranciscoCA25
1002AdamAllenOaklandCA1099
1003DavidWigginsOaklandMI125.25
1004AmandaGreenDetroitMI452.5
1005ColonelMustardLos AngelesCA950
1006PaulineHallSagninawMI432.22
1007SarahMillerCheyenneWY724.22
1008TeddySmithJuneauAK852.11
1009JoelleHigginsSacramentoCA100


Transformation:

Initially, you want to review your list of orders by last name.

Transformation Name Sort rows
Parameter: Sort by LastName

During your review, you notice that two customer orders are no longer valid and need to be removed. They are:

  • LastName: Hall
  • LastName: Jones

You might hover over the left side of the screen to reveal the row numbers. You select the row numbers for each of these rows, and a delete suggestion is provided for you. When you click Modify, you see the following transformation:

Transformation Name Filter rows
Parameter: Condition Custom formula
Parameter: Type of formula Custom single
Parameter: Condition in(sourcerownumber(), [2,7])
Parameter: Action Delete matching rows

The above checks the results of the sourcerownumber function, which returns the original row order for the selected rows. If a selected row matches values in the [2,7] array of row numbers, then the row is deleted.

Results:

When the preceding transform is added, your dataset looks like the following, and your sort order is maintained:

Source:

CustIdFirstNameLastNameCityStateLastOrder
1002AdamAllenOaklandCA1099
1004AmandaGreenDetroitMI452.5
1009JoelleHigginsSacramentoCA100
1007SarahMillerCheyenneWY724.22
1005ColonelMustardLos AngelesCA950
1008TeddySmithJuneauAK852.11
1003DavidWigginsOaklandMI125.25

Your Rating: Results: 1 Star2 Star3 Star4 Star5 Star 12 rates

This page has no comments.