Excerpt |
---|
Fills any missing or null values in the specified column with the most recent non-blank value, as determined by the specified window of rows before and after the blank value. |
The row from which to extract a value is determined by the order in which the rows are organized at the time that the function is executed.
If you are working on a randomly generated sample of your dataset, the values that you see for this function might not correspond to the values that are generated on the full dataset during job execution.
- In addition to the column to which to apply the function, the function accepts two other parameters:
- You can use the
group
and order
parameters to define the groups of records and the order of those records to which this function is applied. - This function works with the following transforms:
D lang syntax |
---|
RawWrangle | true |
---|
Type | ref |
---|
showNote | true |
---|
WrangleText | window value:fill(myNumber,-1,0) order:'Date' |
---|
|
fill(myNumber,-1,0) |
Output: Returns all values from the myNumber
column with any null cells filled by the most recent non-blank value.
D lang syntax |
---|
RawWrangle | true |
---|
Type | ref |
---|
showNote | true |
---|
WrangleText | window value:fill(myNumber,-5,4) order:'Date' |
---|
|
fill(myNumber,-5,4) |
Output: Returns all values from the myNumber
column with any null cells filled by the most recent non-empty value within the window 5 rows before the current row and 4 rows after it.
D lang syntax |
---|
RawWrangle | true |
---|
Type | syntax |
---|
showNote | true |
---|
WrangleText | window value:fill(col_ref, int_rows_before, int_rows_after) order: order_col [group: group_col] |
---|
|
fill(col_ref, int_rows_before, int_rows_after) order: order_col [group: group_col] |
Argument | Required? | Data Type | Description |
---|
col_ref | Y | string | Name of column whose values are applied to the function |
int_rows_before | Y | integer | Number of rows before current row to scan for non-empty value. Default is -1 , which scans all rows before. |
int_rows_after | Y | integer | Number of rows after current row to scan for non-empty value. Default is 0 , which scans no rows after. |
For more information on the order
and group
parameters, see Window Transform.
col_ref
Name of the column whose values are filled when null.
- Multiple columns and wildcards are not supported.
Required? | Data Type | Example Value |
---|
Yes | String (column reference) | myColumn |
int_rows_before
Number of rows before the current row to scan for the most recent non-empty value.
- Default value is
-1
, which scans all preceding rows. 0
does not scan before the current row.
Required? | Data Type | Example Value |
---|
Yes | Integer | 5 |
int_rows_after
Number of rows after the current row to scan for the most recent non-empty value.
- Default value is
0
, which does not scan rows after the current row. -1
scans all following rows.
Required? | Data Type | Example Value |
---|
Yes | Integer | 5 |
Example - Fill with quarterly forecast values
Your monthly sales data includes amount sold for each month. However, the source system only provides the quarterly forecast as part of the data for the first month of each quarter. You can use the FILL
function to add the prior forecast to each month's data.
Source:
Date | Amount | Forecast_Qtr |
---|
1/31/15 | 523 | 1400 |
2/28/15 | 135 | |
3/31/15 | 824 | |
4/30/15 | 305 | 1500 |
5/31/15 | 763 | |
6/30/15 | 421 | |
7/31/15 | 606 | 1600 |
8/31/15 | 477 | |
9/30/15 | 785 | |
10/31/15 | 443 | 1700 |
11/30/15 | 622 | |
12/31/15 | 518 | |
Transformation:
You can use the following transformation to fill the prior forecast value for each blank month in the Forecast_Qtr
column. Note that the order
parameter must be set to Date
to establish the proper sorting:
D trans |
---|
RawWrangle | true |
---|
Type | step |
---|
WrangleText | window value: fill(Forecast_Qtr,-1,0) order: Date |
---|
p01Name | Formulas |
---|
p01Value | fill(Forecast_Qtr,-1,0) |
---|
p02Name | Order by |
---|
p02Value | Date |
---|
SearchTerm | Window |
---|
|
You can now delete the Forecast_Qtr
column and rename the generated window
column to the deleted name.
To see how you are progressing each month, you might use the following transformation, which computes the average forecast for each month:
D trans |
---|
RawWrangle | true |
---|
p03Value | 'Forecast_Month_Avg' |
---|
Type | step |
---|
WrangleText | derive type:single value:numformat((Forecast_Qtr/3),'####.##') as:'Forecast_Month_Avg' |
---|
p01Name | Formula type |
---|
p01Value | Single row formula |
---|
p02Name | Formula |
---|
p02Value | numformat((Forecast_Qtr/3),'####.##') |
---|
p03Name | New column name |
---|
SearchTerm | New formula |
---|
|
You can then compare this value to the actual Amount
value for each month:
D trans |
---|
RawWrangle | true |
---|
p03Value | 'MonthActualVForecast_Pct' |
---|
Type | step |
---|
WrangleText | derive type:single value:numformat(((Amount - Forecast_Month_Avg)/Forecast_Month_Avg)*100, '##.00') as:'MonthActualVForecast_Pct' |
---|
p01Name | Formula type |
---|
p01Value | Single row formula |
---|
p02Name | Formula |
---|
p02Value | numformat(((Amount - Forecast_Month_Avg)/Forecast_Month_Avg)*100, '##.00') |
---|
p03Name | New column name |
---|
SearchTerm | New formula |
---|
|
Results:
Date | Amount | Forecast_Qtr | Forecast_Month_Avg | MonthActualVForecast_Pct |
---|
1/31/15 | 523 | 1400 | 466.67 | 12.07 |
2/28/15 | 135 | 1400 | 466.67 | -71.07 |
3/31/15 | 824 | 1400 | 466.67 | 76.57 |
4/30/15 | 305 | 1500 | 500 | -39.00 |
5/31/15 | 763 | 1500 | 500 | 52.60 |
6/30/15 | 421 | 1500 | 500 | -15.80 |
7/31/15 | 606 | 1600 | 533.33 | 13.63 |
8/31/15 | 477 | 1600 | 533.33 | -10.56 |
9/30/15 | 785 | 1600 | 533.33 | 47.19 |
10/31/15 | 443 | 1700 | 566.67 | -21.82 |
11/30/15 | 622 | 1700 | 566.67 | 9.76 |
12/31/15 | 518 | 1700 | 566.67 | -8.59 |