RANGE Function
Computes an array of integers, from a beginning integer to an end (stop) integer, stepping by a third parameter.
Nota
If the function generates more than 100,000 values for a cell, the output is a null value.
Wrangle vs. SQL: This function is part of Wrangle, a proprietary data transformation language. Wrangle is not SQL. For more information, see Wrangle Language.
Basic Usage
Numeric literal example:
<span>range(0,3,1</span><span> </span><span>)</span>
Output: Returns the following array:
[0,1,2]
Column reference example:
<span>range(0,MaxValue,stepValue)</span>
Output: Returns an array of values from zero to the value in the MaxValue
column stepping by the stepValue
column value.
Syntax and Arguments
range(column_integer_start, column _integer_end, column_integer_step)
Argument | Required? | Data Type | Description |
---|---|---|---|
column_integer_start | Y | string or integer | Name of column or Integer literal that represents the start of the range |
column_integer_end | Y | string or integer | Name of column or Integer literal that represents the end of the range |
column_integer_step | Y | string or integer | Name of column or Integer literal that represents the steps in integers between values in the range |
For more information on syntax standards, see Language Documentation Syntax Notes.
column_integer_start
Name of the column or value of the starting integer used to compute the range.
Nota
This value is always included in the range, unless it is equal to the value for col-integer-stop
, which results in a blank array.
Missing input values generate missing results.
Multiple columns and wildcards are not supported.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
Yes | Integer | 0 |
column_integer_end
Name of the column or value of the end integer used to compute the range.
Nota
This value is not included in the output.
Missing input values generate missing results.
Multiple columns and wildcards are not supported.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
Yes | Integer | 20 |
column_integer_step
Name of the column or value of the integer used to compute the integer interval (step) between each value in the range.
Nota
This value must be a positive integer. If col-integer-start
is greater than col-integer-stop
, steps are negative values of this parameter.
Missing input values generate missing results.
Multiple columns and wildcards are not supported.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
Yes | Integer | 2 |
Examples
Dica
For additional examples, see Common Tasks.
Example - Breaking out log messages
Source:
Your dataset contains log data that is gathered each minute, yet each entry can contain multiple error messages in an array. The key fields might look like the following:
Timestamp | Errors |
---|---|
02/16/16 15:31 | ["Unable to connect","File not found","Proxy down","conn. timeout"] |
02/16/16 15:30 | [] |
02/16/16 15:29 | ["Access forbidden","Invalid password"] |
Transformation:
You can use the following steps to break out the array values into separate rows. The following transform generates a column containing the number of elements in each row's Errors
array.
Transformation Name | |
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | arraylen(Errors) |
Parameter: New column name | 'arraylength_Errors' |
This transform deletes rows that contain no errors:
Transformation Name | |
---|---|
Parameter: Condition | Custom formula |
Parameter: Type of formula | Custom single |
Parameter: Condition | (arraylength_Errors == 0) |
Parameter: Action | Delete matching rows |
For the remaining rows, you can generate a column containing an array of numbers to match the count of error messages:
Transformation Name | |
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | range(0,arraylength_Errors,1) |
Parameter: New column name | 'range_Errors' |
You can then use the ARRAYZIP
function to zip together the two arrays into a single one:
Transformation Name | |
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | arrayzip([range_Errors,Errors]) |
Parameter: New column name | 'zipped_Errors' |
The unnest
transform uses the values in an array column as key values to break out rows in your dataset:
Transformation Name | |
---|---|
Parameter: Column | zipped_Errors |
You might rename the above as individual_Errors
. To clean up your dataset, you can now delete the following columns:
arraylength_Errors
range_Errors
zipped_Errors
Results:
Timestamp | Errors | individual_Errors |
---|---|---|
02/16/16 15:31 | ["Unable to connect","File not found","Proxy down","conn. timeout"] | [0, "Unable to connect"] |
02/16/16 15:31 | ["Unable to connect","File not found","Proxy down","conn. timeout"] | [1, "File not found"] |
02/16/16 15:31 | ["Unable to connect","File not found","Proxy down","conn. timeout"] | [2, "Proxy down"] |
02/16/16 15:31 | ["Unable to connect","File not found","Proxy down","conn. timeout"] | [3, "conn. timeout"] |
02/16/16 15:29 | ["Access forbidden","Invalid password"] | [0, "Access forbidden"] |
02/16/16 15:29 | ["Access forbidden","Invalid password"] | [1, "Invalid password"] |
Example - unnest test scores
The following example includes a range
example to define a new index array.
This example illustrates you to use the flatten and unnest transforms.
Source:
You have the following data on student test scores. Scores on individual scores are stored in the Scores
array, and you need to be able to track each test on a uniquely identifiable row. This example has two goals:
One row for each student test
Unique identifier for each student-score combination
LastName | FirstName | Scores |
---|---|---|
Adams | Allen | [81,87,83,79] |
Burns | Bonnie | [98,94,92,85] |
Cannon | Charles | [88,81,85,78] |
Transformation:
When the data is imported from CSV format, you must add a header
transform and remove the quotes from the Scores
column:
Transformation Name | |
---|---|
Parameter: Option | Use row(s) as column names |
Parameter: Type | Use a single row to name columns |
Parameter: Row number | 1 |
Transformation Name | |
---|---|
Parameter: Column | colScores |
Parameter: Find | '\"' |
Parameter: Replace with | '' |
Parameter: Match all occurrences | true |
Validate test date: To begin, you might want to check to see if you have the proper number of test scores for each student. You can use the following transform to calculate the difference between the expected number of elements in the Scores
array (4) and the actual number:
Transformation Name | |
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | (4 - arraylen(Scores)) |
Parameter: New column name | 'numMissingTests' |
When the transform is previewed, you can see in the sample dataset that all tests are included. You might or might not want to include this column in the final dataset, as you might identify missing tests when the recipe is run at scale.
Unique row identifier: The Scores
array must be broken out into individual rows for each test. However, there is no unique identifier for the row to track individual tests. In theory, you could use the combination of LastName-FirstName-Scores
values to do so, but if a student recorded the same score twice, your dataset has duplicate rows. In the following transform, you create a parallel array called Tests
, which contains an index array for the number of values in the Scores
column. Index values start at 0
:
Transformation Name | |
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | range(0,arraylen(Scores)) |
Parameter: New column name | 'Tests' |
Also, we will want to create an identifier for the source row using the sourcerownumber
function:
Transformation Name | |
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | sourcerownumber() |
Parameter: New column name | 'orderIndex' |
One row for each student test: Your data should look like the following:
LastName | FirstName | Scores | Tests | orderIndex |
---|---|---|---|---|
Adams | Allen | [81,87,83,79] | [0,1,2,3] | 2 |
Burns | Bonnie | [98,94,92,85] | [0,1,2,3] | 3 |
Cannon | Charles | [88,81,85,78] | [0,1,2,3] | 4 |
Now, you want to bring together the Tests
and Scores
arrays into a single nested array using the arrayzip
function:
Transformation Name | |
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | arrayzip([Tests,Scores]) |
Your dataset has been changed:
LastName | FirstName | Scores | Tests | orderIndex | column1 |
---|---|---|---|---|---|
Adams | Allen | [81,87,83,79] | [0,1,2,3] | 2 | [[0,81],[1,87],[2,83],[3,79]] |
Adams | Bonnie | [98,94,92,85] | [0,1,2,3] | 3 | [[0,98],[1,94],[2,92],[3,85]] |
Cannon | Charles | [88,81,85,78] | [0,1,2,3] | 4 | [[0,88],[1,81],[2,85],[3,78]] |
Use the following to unpack the nested array:
Transformation Name | |
---|---|
Parameter: Column | column1 |
Each test-score combination is now broken out into a separate row. The nested Test-Score combinations must be broken out into separate columns using the following:
Transformation Name | |
---|---|
Parameter: Column | column1 |
Parameter: Paths to elements | '[0]','[1]' |
After you delete column1
, which is no longer needed you should rename the two generated columns:
Transformation Name | |
---|---|
Parameter: Option | Manual rename |
Parameter: Column | column_0 |
Parameter: New column name | 'TestNum' |
Transformation Name | |
---|---|
Parameter: Option | Manual rename |
Parameter: Column | column_1 |
Parameter: New column name | 'TestScore' |
Unique row identifier: You can do one more step to create unique test identifiers, which identify the specific test for each student. The following uses the original row identifier OrderIndex
as an identifier for the student and the TestNumber
value to create the TestId
column value:
Transformation Name | |
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | (orderIndex * 10) + TestNum |
Parameter: New column name | 'TestId' |
The above are integer values. To make your identifiers look prettier, you might add the following:
Transformation Name | |
---|---|
Parameter: Columns | 'TestId00','TestId' |
Extending: You might want to generate some summary statistical information on this dataset. For example, you might be interested in calculating each student's average test score. This step requires figuring out how to properly group the test values. In this case, you cannot group by the LastName
value, and when executed at scale, there might be collisions between first names when this recipe is run at scale. So, you might need to create a kind of primary key using the following:
Transformation Name | |
---|---|
Parameter: Columns | 'LastName','FirstName' |
Parameter: Separator | '-' |
Parameter: New column name | 'studentId' |
You can now use this as a grouping parameter for your calculation:
Transformation Name | |
---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | average(TestScore) |
Parameter: Group rows by | studentId |
Parameter: New column name | 'avg_TestScore' |
Results:
After you delete unnecessary columns and move your columns around, the dataset should look like the following:
TestId | LastName | FirstName | TestNum | TestScore | studentId | avg_TestScore |
---|---|---|---|---|---|---|
TestId0021 | Adams | Allen | 0 | 81 | Adams-Allen | 82.5 |
TestId0022 | Adams | Allen | 1 | 87 | Adams-Allen | 82.5 |
TestId0023 | Adams | Allen | 2 | 83 | Adams-Allen | 82.5 |
TestId0024 | Adams | Allen | 3 | 79 | Adams-Allen | 82.5 |
TestId0031 | Adams | Bonnie | 0 | 98 | Adams-Bonnie | 92.25 |
TestId0032 | Adams | Bonnie | 1 | 94 | Adams-Bonnie | 92.25 |
TestId0033 | Adams | Bonnie | 2 | 92 | Adams-Bonnie | 92.25 |
TestId0034 | Adams | Bonnie | 3 | 85 | Adams-Bonnie | 92.25 |
TestId0041 | Cannon | Chris | 0 | 88 | Cannon-Chris | 83 |
TestId0042 | Cannon | Chris | 1 | 81 | Cannon-Chris | 83 |
TestId0043 | Cannon | Chris | 2 | 85 | Cannon-Chris | 83 |
TestId0044 | Cannon | Chris | 3 | 78 | Cannon-Chris | 83 |