Window Transformations
A window transformation performs calculations on a row based on row values that are related to it. Windowing functions can perform calculations based on time, relative row positions, and rolling windows.
For example, you might wish to calculate the average percentage of CPU usage over 24hour intervals based on log entries. From the rows of data, you can create a window function that calculates the average value in the CPU usage column over the 24hour period, as defined based on date values for each log entry.
Key distinction:
In a window function, the output of each row's calculation is specific to the row.
In an aggregate function, the output for a row is the same value for all rows that are used in the calculation.
For more information on aggregation, see Create Aggregations.
Basic Structure
You can use windowing functions with the following transformation types:
window  creates a new column called
window
New formula  creates a new column that you name
Edit with formula  modifies the values in a column based on a formula that you specify.
Group by parameter
You can use the Group by parameter to define the column of values by which rows of data are grouped for calculation purposes. For example, if your Group by column contains months, your calculations are computed for each month represented in the column values.
Note
Transforms that use the group
parameter can result in nondeterministic reordering in the data grid. However, you should apply the group
parameter, particularly on larger datasets, or your job may run out of memory and fail. To enforce row ordering, you can use the sort
transform. For more information, see Sort Transform.
Order by parameter
When using window functions, you can use the Order by parameter to specify the column or columns by which to sort the output.
Source:
The following table contains the sales data of a company for all the four regions in the last three months.
Month  Sales  Region 

20210101  800  East 
20210101  1500  West 
20210101  1000  North 
20210101  2000  South 
20210201  1250  East 
20210201  800  West 
20210201  1100  North 
20210201  700  South 
20210301  900  East 
20210301  1000  West 
20210301  1400  North 
20210301  800  South 
Transformation:
In the following transformation, you can calculate the rolling average of sales. You apply the ROLLINGAVERAGE
and specify that the results are to be ordered by the Sales column.
Transformation Name 


Parameter: Formulas  ROLLINGAVERAGE (Sales, 0,1) 
Parameter: Order by  Sales 
Results:
The following dataset shows the ROLLINGAVERAGE
ordered by Sales column.
Month  Sales  Region  RollingAverage 

20210201  700  South  750 
20210101  800  East  800 
20210201  800  West  800 
20210301  800  South  850 
20210301  900  East  950 
20210101  1000  North  1000 
20210301  1000  West  1050 
20210201  1100  North  1175 
20210201  1250  East  1325 
20210301  1400  North  1450 
20210101  1500  West  1750 
20210101  2000  South  2000 
Compute over Time Windows
You may need to create windows of time within your data that are not cleanly segmented by basic units of time measurement. For example, you may need to create a custom time period, called a session, based on timestamps recorded in eventbased data.
A session is usually defined as a group of events that occur within a given time frame. For example, you may need to perform calculations based on fiveminute intervals within your logging data. If a user opens your shopping website, logs in, searches items, and then logs out within a fiveminute interval, that can be grouped under a single session. However, if the user's interaction lasted six minutes, the logged events may span multiple windowed sessions in the data.
You can use the SESSION
function to create time boxes based on a time period that you specify. When the function is applied to your column of timestamp values, the application assigns an ID to events that belong to the same session.
From the following example, you can create a Session ID. After you create the session ID, you can find the volume of data consumed by the individual user.
Source:
User Name  TimeStamp  Activity  Volume (in Kb) 

Bob  02/11/21 08:01:13  Read  1024 
William  02/11/21 08:01:00  Read  1024 
John  02/11/21 08:01:17  Read  1024 
Christy  02/11/21 08:01:17  Read  1024 
William  02/11/21 08:03:33  Read  520 
Christy  02/11/21 08:02:01  Password change  1024 
Bob  02/11/21 08:07:23  Adding items to cart  2048 
William  02/11/21 08:05:45  Read  520 
William  02/11/21 08:11:56  Account settings  2048 
John  02/11/21 08:15:11  Password change  2048 
Bob  02/11/21 08:34:00  Proceeding to payment  2048 
Bob  02/11/21 08:43:03  logout  2048 
Christy  02/11/21 09:03:43  Read  1024 
Christy  02/11/21 09:10:00  logout  1024 
Transformation:
Transformation Name 


Parameter: Formulas  SESSION (TimeStamp, 5, minute) 
Parameter: Group by  User Name 
Parameter: Order by  TimeStamp 
Since the new column is named window
, you should rename it:
Transformation Name 


Parameter: Option  Manual rename 
Parameter: Column  window 
Parameter: New column name  SESSIONID 
With this session ID, you can calculate the maximum volume of data consumed by each session ID and by each user.
Transformation Name 


Parameter: Formula type  Multiple row formula 
Parameter: Formula  MAX(Volume (in Kb)) 
Parameter: Sort rows by  SessionID 
Parameter: Group rows by  User Name, SessionID 
Parameter: New column name  Volume_Consumed (in Kb) 
Results:
User Name  TimeStamp  Activity  Volume (in Kb)  SessionID  max_Volume (in Kb) 

William  02/11/21 08:01:00  Read  1024  1  1024 
William  02/11/21 08:03:33  Read  520  1  1024 
William  02/11/21 08:05:45  Read  520  1  1024 
William  02/11/21 08:11:56  Account settings  2048  2  2048 
Bob  02/11/21 08:01:13  Read  1024  1  1024 
Bob  02/11/21 08:07:23  Adding items to cart  2048  2  2048 
Bob  02/11/21 08:34:00  Proceeding to payment  2048  3  2048 
Bob  02/11/21 08:43:03  logout  2048  4  2048 
Christy  02/11/21 08:01:17  Read  1024  1  1024 
Christy  02/11/21 08:02:01  Password change  1024  1  1024 
Christy  02/11/21 09:03:43  Read  1024  2  1024 
Christy  02/11/21 09:10:00  logout  1024  3  1024 
John  02/11/21 08:01:17  Read  1024  1  1024 
John  02/11/21 08:15:11  Password change  2048  2  2048 
Calculate over preceding and following rows
The PREV
and NEXT
functions enable you to fetch data from a previous row or a subsequent row, which is helpful for identifying relative changes or trends in your data.
Source:
The following dataset contains orders for different product types over a given time period. You can apply the PREV
and NEXT
functions to calculate the previous orders and the next orders to analyze the trend of orders and derive the average of orders for a product group.
Product_Type  Order_date  Order 

Laptop  20210105  300 
Laptop  20210126  1780 
Laptop  20210109  500 
Laptop  20210131  1200 
SmartPhone  20210124  1400 
SmartPhone  20210126  2200 
SmartPhone  20210107  700 
Tablet  20210121  600 
Tablet  20210123  900 
Transformation:
You can also calculate the percentage of change in orders over time. The following transformation calculates the change between the current order and the previous one and then divides that value over the previous value to calculate the percent change between the rows:
Transformation Name 


Parameter: Formulas  (Order  PREV(Order, 1)) / PREV(Order, 1) * 100 
Parameter: Group by  Product_Type 
Parameter: Order by  Order 
After you rename the column to ChangeinOrder, you can apply the NUMFORMAT
function to clean up and format the ChangeinOrder values. The following transformation reformats the ChangeinOrder column to display two decimal places.
Transformation Name 


Parameter: Column  ChangeinOrder 
Parameter: Formula  NUMFORMAT(ChangeinOrder, '##.##') 
Similarly, you can apply the NEXT
function and calculate the Change in orders for upcoming months.
Results:
Product_Type  Order_date  Order  NEXTOrder  ChangeinOrder 

Laptop  20210105  300  500  
Laptop  20210109  500  1200  66.67 
Laptop  20210131  1200  1780  140 
Laptop  20210126  1780  48.33  
SmartPhone  20210107  700  400  
SmartPhone  20210124  1400  2200  100 
SmartPhone  20210126  2200  57.14  
Tablet  20210121  600  900  
Tablet  20210123  900  50 
Fill Empty Values
You can use the FILL
function to fill empty or null values in your data with the last nonempty value in the group.
Source:
For example, the following dataset contains the daily orders received. Note the missing values due to weekends. You can assume that the no orders were received for Saturday and Sunday ,
Date  DayOfWeek  OrdersDay  OrdersTotal 

20210310  Wednesday  100  100 
20210311  Thursday  112  212 
20210312  Friday  320  532 
20210313  Saturday  
20210314  Sunday  
20210315  Monday  300  832 
Transformation:
You have to clean up the data to fill the values for OrdersDay column. You can use the following function to fill the empty and null values. This function tests the the OrdersDay column to check if the column is empty or null. If so, the value '0
' is written in the column, else the value of the column ($col
) is written.
Transformation Name 


Parameter: Column  OrdersDay 
Parameter: Formula  IF(OrdersDay == ''  ISNULL(OrdersDay), '0', $col) 
You can see the values of Friday is taken for Saturday and Sunday and filled it accordingly as per theFILL
function.
Transformation Name 


Parameter: Column  OrdersTotal 
Parameter: Formula  IF (OrdersDay == '0', FILL (OrdersTotal, 1,0),$col) 
Parameter: Order by  Date 
Results:
Date  DayOfWeek  OrdersDay  OrdersTotal 

20210310  Wednesday  100  100 
20210311  Thursday  112  212 
20210312  Friday  320  532 
20210313  Saturday  0  532 
20210314  Sunday  0  532 
20210315  Monday  300  832 
Calculate Rank
The RANK
function enables you to create rankings in your data based on calculations by returning a ranking value for each row with the specified group of values. When used, some rows might receive the same value as other rows. For example, if there are three tie values in a group, the same rank is assigned to the rows and the next three ranks are skipped.
The DENSERANK
function enables you to generate a ranked order of values within a group. If there are tie values in a group, it does not skip rank in case of tie values. For example, if two rows are listed as rank 2, then the fourth row receives rank 3.
Source:
The following dataset contains total Sales information by quarter. You can use the RANK
and DENSERANK
to identify the quarters with the highest sales.
Year  Quarter  Sales 

2018  1  1000 
2018  2  2000 
2018  3  3000 
2018  4  2000 
2019  1  1000 
2019  2  500 
2019  3  9000 
2019  4  3000 
2020  1  500 
2020  2  500 
2020  3  200 
2020  4  400 
Transformation:
RANK:
Transformation Name 


Parameter: Formula type  Multiple row formula 
Parameter: Formula  RANK() 
Parameter: Sort rows by  Sales 
Parameter: New column name  SalesRank 
DENSERANK:
Transformation Name 


Parameter: Formula type  Multiple row formula 
Parameter: Formula  DENSERANK() 
Parameter: Sort rows by  Sales 
Parameter: New column name  SalesDenseRank 
Results:
For the RANK function, when multiple rows share the same rank, the next rank is not consecutive, whereas for the DENSERANK
function, the next rank is consecutive.
Year  Quarter  Sales  SalesDenseRank  SalesRank 

2020  3  200  1  1 
2020  4  400  2  2 
2020  2  500  3  3 
2020  1  500  3  3 
2019  2  500  3  3 
2019  1  1000  4  6 
2018  1  1000  4  6 
2018  4  2000  5  8 
2018  2  2000  5  8 
2019  4  3000  6  10 
2018  3  3000  6  10 
2019  3  9000  7  12 
Calculate Rolling Functions
Rolling calculations enable you to compute a function over a changing set of rows. Rolling calculations are useful for computing the current state of a measure within your data.
For example, in the above sample data, you can find the rolling sum and rolling average of the sales for the year. You can use the above example data to find the rolling sum and rolling average.
Source:
From the following dataset, you can calculate the rolling calculations such as ROLLINGSUM
, ROLLINGAVERAGE
, ROLLINGMAX
, and ROLLINGMIN
.
Year  Quarter  Sales 

2018  1  1000 
2018  2  2000 
2018  3  3000 
2018  4  2000 
2019  1  1000 
2019  2  500 
2019  3  9000 
2019  4  3000 
2020  1  500 
2020  2  500 
2020  3  200 
2020  4  400 
Transformation:
Transformation Name 


Parameter: Formulas  ROLLINGSUM (Sales, 0,1) 
Parameter: Formulas  ROLLINGAVERAGE (Sales, 0,1) 
Parameter: Formulas  ROLLINGMAX (Sales, 0, 1) 
Parameter: Formulas  ROLLINGMIN (Sales, 0,1) 
Parameter: Order by  Sales 
You can rename the required columns accordingly.
Results:
Year  Quarter  Sales  RollingSumSales  RollingAverageSales  RollingMinSales  RollingMaxSales 

2020  3  200  600  300  200  400 
2020  4  400  900  450  400  500 
2020  2  500  1000  500  500  500 
2020  1  500  1000  500  500  500 
2019  2  500  1500  750  500  1000 
2019  1  1000  2000  1000  1000  1000 
2018  1  1000  3000  1500  1000  2000 
2018  4  2000  4000  2000  2000  2000 
2018  2  2000  5000  2500  2000  3000 
2019  4  3000  6000  3000  3000  3000 
2018  3  3000  12000  6000  3000  9000 
2019  3  9000  9000  9000  9000  9000 
Rolling date functions
The Rolling date functions enable you to calculate forward or backward of the current row within the specified column. For example, when dealing with business calendars, you might want to know if the date falls on a holiday or weekend; based on that, you can roll the date forward or backward according to the business calendar.
Source:
The following example dataset shows the order date, order quantity that belongs to a product group. You are interested in finding the rolling minimum and maximum dates for the product group, as well as the rolling mode value. You can useROLLINGMINDATE
, ROLLINGMAXDATE
, and ROLLINGMODEDATE
functions.
Order_date  Order_quantity  Product_Group 

20210414  750  PG001 
20210713  1500  PG001 
20210831  355  PG002 
20210216  2000  PG002 
20210513  867  PG002 
20210618  1010  PG002 
20211115  909  PG003 
20211016  200  PG003 
20210909  200  PG004 
20210101  900  PG004 
20211207  707  PG004 
Transformation:
Transformation Name 


Parameter: Formulas  ROLLINGSUM (Sales, 0,1) 
Parameter: Formulas  ROLLINGMAXDATE (Order_date, 0,1) 
Parameter: Formulas  ROLLINGMINDATE (Order_date, 0, 1) 
Parameter: Formulas  ROLLINGMODEDATE (Order_date, 0,1) 
Parameter: Order by  Order_date 
Results:
Order_date  Order_quantity  Product_Group  RollingMaxdate  RollingMindate  RollingModedate 

20210101  900  PG004  20210216  20210101  20210101 
20210216  2000  PG002  20210414  20210216  20210216 
20210414  750  PG001  20210513  20210414  20210414 
20210513  867  PG002  20210618  20210513  20210513 
20210618  1010  PG002  20210713  20210618  20210618 
20210713  1500  PG001  20210831  20210713  20210713 
20210831  355  PG002  20210909  20210831  20210831 
20210909  200  PG004  20211016  20210909  20210909 
20211016  200  PG003  20211115  20211016  20211016 
20211115  909  PG003  20211207  20211115  20211115 
20211207  707  PG004  20211207  20211207  20211207 