Page tree

Release 6.8.2


Contents:

   

Contents:


Generates a new session identifier based on a sorted column of timestamps and a specified rolling timeframe.

The SESSION function takes three parameters:

  • reference to column containing Datetime values used to identify sessions.
  • Numeric value to identify the length of the rolling timeframe that demarcates a session.
  • Unit of measure for the length of the rolling timeframe.

Like other windowing, order and group parameters can be applied. You can use the group and order parameters to define the groups of records and the order of those records to which this transform is applied. 

NOTE: While not explicitly required, you should use the order parameter to define a column used to sort the dataset rows. This column should match the column reference in the SESSION function.

In the generated column, identifiers are assigned to each row based on the computed session to which the row data belongs. Session IDs begin at 1 and increment, within each group. 

  • For each new group, session identifiers begin with the value 1. As a result, session identifiers are unique only within the group, and the combination of group identifier and session identifier is a unique key within the dataset. 

This function works with the following transforms:

Basic Usage

session(Timestamp, 1, hour) order:'Timestamp'

Output: Returns session identifiers for groups of rows based on 1-hour segments.

Syntax and Arguments

session(col_ref, k_integer,unit_of_time) order: order_col [group: group_col]


ArgumentRequired?Data TypeDescription
col_refYstringName of column whose values are applied to the function
k_integerYinteger (positive)Length of a session, in combination with unit_of_time argument
unit_of_timeYstringString literal that indicates the units of time to define a session's duration

For more information on the order and group parameters, see Window Transform.

For more information on syntax standards, see Language Documentation Syntax Notes.

col_ref

Name of the Datetime column whose values are used to determine sessions. 

  • Multiple columns and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString (column reference to Datetime values)myDates

k_integer

Defines the length of a session as this number of units, which are defined in the unit_of_time parameter.

NOTE: The start of a new session is determined by the first record that is found outside the boundary of the previous session. It is not determined based on any interpretation of a fixed interval.

Usage Notes:

Required?Data TypeExample Value
YesInteger (positive)24

unit_of_time

Defines the length of each unit of time for purposes of defining the length of a session.

Usage Notes:

Required?Data TypeExample Value
YesString literal. See below for list.hour

Supported values:  dayhour, millisecond, minutesecond

Examples


Tip: For additional examples, see Common Tasks.


Example - Assign session identifiers to timestamped events

The following dataset contains events from a web site, categorized by customer identifier. Timestamp information is stored in two separate columns, and the imported data is sorted by the Action column. Your goal is to generate session identifiers based on sessions of five minute intervals for each customer.

Source:

DateTimeCustIdAction
2/1/169:23:00 AMC001change account settings
2/1/169:23:58 AMC003complete order
2/1/169:20:00 AMC002login
2/1/169:20:22 AMC003login
2/1/169:20:41 AMC001login
2/1/169:24:52 AMC004login
2/1/1611:24:21 AMC001login
2/1/169:24:18 AMC001logout
2/1/169:24:49 AMC003logout
2/1/169:26:22 AMC002logout
2/1/169:24:10 AMC002search: bicycles
2/1/169:23:50 AMC002search: pennyfarthings
2/1/1611:56:09 PMC004search: unicycles

Transformation:

This data makes more sense if it was organized by timestamp of each event. However, the timestamp information is spread across two fields: Date and Time. Your first step is to consolidate this data into a single field:

Transformation Name Merge columns
Parameter: Columns Date,Time
Parameter: Separator ''
Parameter: New column name 'Timestamp'

You can now delete the two source columns. After they are deleted, you might notice that the Timestamp column is still typed as String data. This typing issue is caused by the AM/PM designators, which you can remove with the following transformation:

Transformation Name Replace text or pattern
Parameter: Column Timestamp
Parameter: Find ` {upper}{2}{end}`
Parameter: Replace with ''

Now that you have valid Datetime data, you can create session identifiers using the following transformation:

Transformation Name Window
Parameter: Formulas session(Timestamp, 5, minute)
Parameter: Group by CustId
Parameter: Order by Timestamp

The above transform creates session identifiers from the data in the Timestamp column for five-minute intervals. Data is initially grouped by CustId and then sorted by Timestamp before the SESSION function is applied.

You can choose to rename the generated column:

Transformation Name Rename columns
Parameter: Option Manual rename
Parameter: Column window
Parameter: New column name 'SessionId'

Results:

TimestampCustIdActionSessionId
2/1/2016 11:24:21C001login1
2/1/2016 9:20:41C001login2
2/1/2016 9:23:00C001change account settings2
2/1/2016 9:24:18C001logout2
2/1/2016 9:20:22C003login1
2/1/2016 9:23:58C003complete order1
2/1/2016 9:24:49C003logout1
2/1/2016 9:20:00C002login1
2/1/2016 9:23:50C002search: pennyfarthings1
2/1/2016 9:24:10C002search: bicycles1
2/1/2016 9:26:22C002logout1
1/31/2016 11:56:09 C004search: unicycles1
2/1/2016 9:24:52C004login2

Notes:

  • Dataset is grouped by CustId, but the order of those groupings is determined by the first timestamp for each customer. So, C003 data appears before C002
  • C001 and C004 data result in two separate sessions. 

 

This page has no comments.