Skip to main content

SESSION Function

Generates a new session identifier based on a sorted column of timestamps and a specified rolling timeframe.

The SESSION function takes three parameters:

  • reference to column containing Datetime values used to identify sessions.

  • Numeric value to identify the length of the rolling timeframe that demarcates a session.

  • Unit of measure for the length of the rolling timeframe.

Like other windowing, order and group parameters can be applied. You can use the group and order parameters to define the groups of records and the order of those records to which this transform is applied.

Note

While not explicitly required, you should use the order parameter to define a column used to sort the dataset rows. This column should match the column reference in the SESSION function.

In the generated column, identifiers are assigned to each row based on the computed session to which the row data belongs. Session IDs begin at 1 and increment, within each group.

  • For each new group, session identifiers begin with the value 1. As a result, session identifiers are unique only within the group, and the combination of group identifier and session identifier is a unique key within the dataset.

This function works with the following transforms:

Wrangle vs. SQL: This function is part of Wrangle, a proprietary data transformation language. Wrangle is not SQL. For more information, see Wrangle Language.

Basic Usage

session(Timestamp, 1, hour) order:'Timestamp'

Output: Returns session identifiers for groups of rows based on 1-hour segments.

Syntax and Arguments

session(col_ref, k_integer,unit_of_time) order: order_col [group: group_col]

Argument

Required?

Data Type

Description

col_ref

Y

string

Name of column whose values are applied to the function

k_integer

Y

integer (positive)

Length of a session, in combination with unit_of_time argument

unit_of_time

Y

string

String literal that indicates the units of time to define a session's duration

For more information on the order and group parameters, see Window Transform.

For more information on syntax standards, see Language Documentation Syntax Notes.

col_ref

Name of the Datetime column whose values are used to determine sessions.

  • Multiple columns and wildcards are not supported.

Usage Notes:

Required?

Data Type

Example Value

Yes

String (column reference to Datetime values)

myDates

k_integer

Defines the length of a session as this number of units, which are defined in the unit_of_time parameter.

Note

The start of a new session is determined by the first record that is found outside the boundary of the previous session. It is not determined based on any interpretation of a fixed interval.

Usage Notes:

Required?

Data Type

Example Value

Yes

Integer (positive)

24

unit_of_time

Defines the length of each unit of time for purposes of defining the length of a session.

Usage Notes:

Required?

Data Type

Example Value

Yes

String literal. See below for list.

hour

Supported values: day, hour, millisecond, minute, second

Examples

Astuce

For additional examples, see Common Tasks.

Example - Assign session identifiers to timestamped events

The following dataset contains events from a web site, categorized by customer identifier. Timestamp information is stored in two separate columns, and the imported data is sorted by the Action column. Your goal is to generate session identifiers based on sessions of five minute intervals for each customer.

Source:

Date

Time

CustId

Action

2/1/16

9:23:00 AM

C001

change account settings

2/1/16

9:23:58 AM

C003

complete order

2/1/16

9:20:00 AM

C002

login

2/1/16

9:20:22 AM

C003

login

2/1/16

9:20:41 AM

C001

login

2/1/16

9:24:52 AM

C004

login

2/1/16

11:24:21 AM

C001

login

2/1/16

9:24:18 AM

C001

logout

2/1/16

9:24:49 AM

C003

logout

2/1/16

9:26:22 AM

C002

logout

2/1/16

9:24:10 AM

C002

search: bicycles

2/1/16

9:23:50 AM

C002

search: pennyfarthings

2/1/16

11:56:09 PM

C004

search: unicycles

Transformation:

This data makes more sense if it was organized by timestamp of each event. However, the timestamp information is spread across two fields: Date and Time. Your first step is to consolidate this data into a single field:

Transformation Name

Merge columns

Parameter: Columns

Date,Time

Parameter: Separator

''

Parameter: New column name

'Timestamp'

You can now delete the two source columns. After they are deleted, you might notice that the Timestamp column is still typed as String data. This typing issue is caused by the AM/PM designators, which you can remove with the following transformation:

Transformation Name

Replace text or pattern

Parameter: Column

Timestamp

Parameter: Find

` {upper}{2}{end}`

Parameter: Replace with

''

Now that you have valid Datetime data, you can create session identifiers using the following transformation:

Transformation Name

Window

Parameter: Formulas

session(Timestamp, 5, minute)

Parameter: Group by

CustId

Parameter: Order by

Timestamp

The above transform creates session identifiers from the data in the Timestamp column for five-minute intervals. Data is initially grouped by CustId and then sorted by Timestamp before the SESSION function is applied.

You can choose to rename the generated column:

Transformation Name

Rename columns

Parameter: Option

Manual rename

Parameter: Column

window

Parameter: New column name

'SessionId'

Results:

Timestamp

CustId

Action

SessionId

2/1/2016 11:24:21

C001

login

1

2/1/2016 9:20:41

C001

login

2

2/1/2016 9:23:00

C001

change account settings

2

2/1/2016 9:24:18

C001

logout

2

2/1/2016 9:20:22

C003

login

1

2/1/2016 9:23:58

C003

complete order

1

2/1/2016 9:24:49

C003

logout

1

2/1/2016 9:20:00

C002

login

1

2/1/2016 9:23:50

C002

search: pennyfarthings

1

2/1/2016 9:24:10

C002

search: bicycles

1

2/1/2016 9:26:22

C002

logout

1

1/31/2016 11:56:09

C004

search: unicycles

1

2/1/2016 9:24:52

C004

login

2

Notes:

  • Dataset is grouped by CustId, but the order of those groupings is determined by the first timestamp for each customer. So, C003 data appears before C002.

  • C001 and C004 data result in two separate sessions.