Generates a new session identifier based on a sorted column of timestamps and a specified rolling timeframe. |
SESSION
function takes three parameters:Like other windowing, order
and group
parameters can be applied. You can use the group
and order
parameters to define the groups of records and the order of those records to which this transform is applied.
NOTE: While not explicitly required, you should use the |
In the generated column, identifiers are assigned to each row based on the computed session to which the row data belongs. Session IDs begin at 1
and increment, within each group.
1
. As a result, session identifiers are unique only within the group, and the combination of group identifier and session identifier is a unique key within the dataset. This function works with the following transforms:
window value:SESSION(Timestamp, 1, hour) order:'Timestamp' |
Output: Generates a new column, which contains session identifiers for groups of rows based on 1-hour segments.
window value:SESSION(col_ref, k_integer,unit_of_time) order: order_col [group: group_col] |
Argument | Required? | Data Type | Description |
---|---|---|---|
col_ref | Y | string | Name of column whose values are applied to the function |
k_integer | Y | integer (positive) | Length of a session, in combination with unit_of_time argument |
unit_of_time | Y | string | String literal that indicates the units of time to define a session's duration |
For more information on the order
and group
parameters, see Window Transform.
Name of the Datetime column whose values are used to determine sessions.
Required? | Data Type | Example Value |
---|---|---|
Yes | String (column reference to Datetime values) | myDates |
Defines the length of a session as this number of units, which are defined in the unit_of_time
parameter.
NOTE: The start of a new session is determined by the first record that is found outside the boundary of the previous session. It is not determined based on any interpretation of a fixed interval. |
Required? | Data Type | Example Value |
---|---|---|
Yes | Integer (positive) | 24 |
Defines the length of each unit of time for purposes of defining the length of a session.
Required? | Data Type | Example Value |
---|---|---|
Yes | String literal. See below for list. | hour |
Supported values: day
, hour
, millisecond
, minute
, second
The following dataset contains events from a web site, categorized by customer identifier. Timestamp information is stored in two separate columns, and the imported data is sorted by the Action
column. Your goal is to generate session identifiers based on sessions of five minute intervals for each customer.
Source:
Date | Time | CustId | Action |
---|---|---|---|
2/1/16 | 9:23:00 AM | C001 | change account settings |
2/1/16 | 9:23:58 AM | C003 | complete order |
2/1/16 | 9:20:00 AM | C002 | login |
2/1/16 | 9:20:22 AM | C003 | login |
2/1/16 | 9:20:41 AM | C001 | login |
2/1/16 | 9:24:52 AM | C004 | login |
2/1/16 | 11:24:21 AM | C001 | login |
2/1/16 | 9:24:18 AM | C001 | logout |
2/1/16 | 9:24:49 AM | C003 | logout |
2/1/16 | 9:26:22 AM | C002 | logout |
2/1/16 | 9:24:10 AM | C002 | search: bicycles |
2/1/16 | 9:23:50 AM | C002 | search: pennyfarthings |
2/1/16 | 11:56:09 PM | C004 | search: unicycles |
Transform:
This data makes more sense if it was organized by timestamp of each event. However, the timestamp information is spread across two fields: Date
and Time
. Your first step is to consolidate this data into a single field:
merge col: Date,Time with: ' ' as: 'Timestamp' |
You can now drop the two source columns. After they are dropped, you might notice that the Timestamp
column is still typed as String data. This typing issue is caused by the AM/PM designators, which you can remove with the following transform:
replace col: Timestamp on: ` {upper}{2}{end}` with: '' |
Now that you have valid Datetime data, you can create session identifiers using the following transform:
window value:SESSION(Timestamp, 5, minute) order: Timestamp group: CustId |
The above transform creates session identifiers from the data in the Timestamp
column for five-minute intervals. Data is initially grouped by CustId
and then sorted by Timestamp
before the SESSION
function is applied.
You can choose to rename the generated column:
rename mapping:[window,'SessionId'] |
Results:
Timestamp | CustId | Action | SessionId |
---|---|---|---|
2/1/2016 11:24:21 | C001 | login | 1 |
2/1/2016 9:20:41 | C001 | login | 2 |
2/1/2016 9:23:00 | C001 | change account settings | 2 |
2/1/2016 9:24:18 | C001 | logout | 2 |
2/1/2016 9:20:22 | C003 | login | 1 |
2/1/2016 9:23:58 | C003 | complete order | 1 |
2/1/2016 9:24:49 | C003 | logout | 1 |
2/1/2016 9:20:00 | C002 | login | 1 |
2/1/2016 9:23:50 | C002 | search: pennyfarthings | 1 |
2/1/2016 9:24:10 | C002 | search: bicycles | 1 |
2/1/2016 9:26:22 | C002 | logout | 1 |
1/31/2016 11:56:09 | C004 | search: unicycles | 1 |
2/1/2016 9:24:52 | C004 | login | 2 |
Notes:
CustId
, but the order of those groupings is determined by the first timestamp for each customer. So, C003
data appears before C002
. C001
and C004
data result in two separate sessions.