SESSION Function
Generates a new session identifier based on a sorted column of timestamps and a specified rolling timeframe.
The SESSION
function takes three parameters:
reference to column containing Datetime values used to identify sessions.
Numeric value to identify the length of the rolling timeframe that demarcates a session.
Unit of measure for the length of the rolling timeframe.
Like other windowing, order
and group
parameters can be applied. You can use the group
and order
parameters to define the groups of records and the order of those records to which this transform is applied.
Note
While not explicitly required, you should use the order
parameter to define a column used to sort the dataset rows. This column should match the column reference in the SESSION
function.
In the generated column, identifiers are assigned to each row based on the computed session to which the row data belongs. Session IDs begin at 1
and increment, within each group.
For each new group, session identifiers begin with the value
1
. As a result, session identifiers are unique only within the group, and the combination of group identifier and session identifier is a unique key within the dataset.
This function works with the following transforms:
Wrangle vs. SQL: This function is part of Wrangle, a proprietary data transformation language. Wrangle is not SQL. For more information, see Wrangle Language.
Basic Usage
session(Timestamp, 1, hour) order:'Timestamp'
Output: Returns session identifiers for groups of rows based on 1-hour segments.
Syntax and Arguments
session(col_ref, k_integer,unit_of_time) order: order_col [group: group_col]
Argument | Required? | Data Type | Description |
---|---|---|---|
col_ref | Y | string | Name of column whose values are applied to the function |
k_integer | Y | integer (positive) | Length of a session, in combination with |
unit_of_time | Y | string | String literal that indicates the units of time to define a session's duration |
For more information on the order
and group
parameters, see Window Transform.
For more information on syntax standards, see Language Documentation Syntax Notes.
col_ref
Name of the Datetime column whose values are used to determine sessions.
Multiple columns and wildcards are not supported.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
Yes | String (column reference to Datetime values) | myDates |
k_integer
Defines the length of a session as this number of units, which are defined in the unit_of_time
parameter.
Note
The start of a new session is determined by the first record that is found outside the boundary of the previous session. It is not determined based on any interpretation of a fixed interval.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
Yes | Integer (positive) | 24 |
unit_of_time
Defines the length of each unit of time for purposes of defining the length of a session.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
Yes | String literal. See below for list. | hour |
Supported values: day
, hour
, millisecond
, minute
, second
Examples
Astuce
For additional examples, see Common Tasks.
Example - Assign session identifiers to timestamped events
The following dataset contains events from a web site, categorized by customer identifier. Timestamp information is stored in two separate columns, and the imported data is sorted by the Action
column. Your goal is to generate session identifiers based on sessions of five minute intervals for each customer.
Source:
Date | Time | CustId | Action |
---|---|---|---|
2/1/16 | 9:23:00 AM | C001 | change account settings |
2/1/16 | 9:23:58 AM | C003 | complete order |
2/1/16 | 9:20:00 AM | C002 | login |
2/1/16 | 9:20:22 AM | C003 | login |
2/1/16 | 9:20:41 AM | C001 | login |
2/1/16 | 9:24:52 AM | C004 | login |
2/1/16 | 11:24:21 AM | C001 | login |
2/1/16 | 9:24:18 AM | C001 | logout |
2/1/16 | 9:24:49 AM | C003 | logout |
2/1/16 | 9:26:22 AM | C002 | logout |
2/1/16 | 9:24:10 AM | C002 | search: bicycles |
2/1/16 | 9:23:50 AM | C002 | search: pennyfarthings |
2/1/16 | 11:56:09 PM | C004 | search: unicycles |
Transformation:
This data makes more sense if it was organized by timestamp of each event. However, the timestamp information is spread across two fields: Date
and Time
. Your first step is to consolidate this data into a single field:
Transformation Name | |
---|---|
Parameter: Columns | Date,Time |
Parameter: Separator | '' |
Parameter: New column name | 'Timestamp' |
You can now delete the two source columns. After they are deleted, you might notice that the Timestamp
column is still typed as String data. This typing issue is caused by the AM/PM designators, which you can remove with the following transformation:
Transformation Name | |
---|---|
Parameter: Column | Timestamp |
Parameter: Find | ` {upper}{2}{end}` |
Parameter: Replace with | '' |
Now that you have valid Datetime data, you can create session identifiers using the following transformation:
Transformation Name | |
---|---|
Parameter: Formulas | session(Timestamp, 5, minute) |
Parameter: Group by | CustId |
Parameter: Order by | Timestamp |
The above transform creates session identifiers from the data in the Timestamp
column for five-minute intervals. Data is initially grouped by CustId
and then sorted by Timestamp
before the SESSION
function is applied.
You can choose to rename the generated column:
Transformation Name | |
---|---|
Parameter: Option | Manual rename |
Parameter: Column | window |
Parameter: New column name | 'SessionId' |
Results:
Timestamp | CustId | Action | SessionId |
---|---|---|---|
2/1/2016 11:24:21 | C001 | login | 1 |
2/1/2016 9:20:41 | C001 | login | 2 |
2/1/2016 9:23:00 | C001 | change account settings | 2 |
2/1/2016 9:24:18 | C001 | logout | 2 |
2/1/2016 9:20:22 | C003 | login | 1 |
2/1/2016 9:23:58 | C003 | complete order | 1 |
2/1/2016 9:24:49 | C003 | logout | 1 |
2/1/2016 9:20:00 | C002 | login | 1 |
2/1/2016 9:23:50 | C002 | search: pennyfarthings | 1 |
2/1/2016 9:24:10 | C002 | search: bicycles | 1 |
2/1/2016 9:26:22 | C002 | logout | 1 |
1/31/2016 11:56:09 | C004 | search: unicycles | 1 |
2/1/2016 9:24:52 | C004 | login | 2 |
Notes:
Dataset is grouped by
CustId
, but the order of those groupings is determined by the first timestamp for each customer. So,C003
data appears beforeC002
.C001
andC004
data result in two separate sessions.