Page tree

 

Support | BlogContact Us | 844.332.2821

 

Contents:

This documentation applies to Trifacta Wrangler. Download this free product.
Registered users of this product or Trifacta Wrangler Enterprise should login to Product Docs through the application.

Contents:


Generates a new session identifier based on a sorted column of timestamps and a specified rolling timeframe.The SESSION function takes three parameters:

  • reference to column containing Datetime values used to identify sessions.
  • Numeric value to identify the length of the rolling timeframe that demarcates a session.
  • Unit of measure for the length of the rolling timeframe.

Like other window functions, order and group parameters can be applied. You can use the group and order parameters to define the groups of records and the order of those records to which this transform is applied. 

NOTE: While not explicitly required, you should use the order parameter to define a column used to sort the dataset rows. This column should match the column reference in the SESSION function.

In the generated column, identifiers are assigned to each row based on the computed session to which the row data belongs. Session IDs begin at 1 and increment, within each group. 

  • For each new group, session identifiers begin with the value 1. As a result, session identifiers are unique only within the group, and the combination of group identifier and session identifier is a unique key within the dataset. 

This function works with the following transforms:

Basic Usage

window value:SESSION(Timestamp, 1, hour) order:'Timestamp'

Output: Generates a new column, which contains session identifiers for groups of rows based on 1-hour segments. 

Syntax

window value:SESSION(col_ref, k_integer,unit_of_time) order: order_col [group: group_col]

ArgumentRequired?Data TypeDescription
col_refYstringName of column whose values are applied to the function
k_integerYinteger (positive)Length of a session, in combination with unit_of_time argument
unit_of_timeYstringString literal that indicates the units of time to define a session's duration

For more information on the order and group parameters, see Window Transform.

For more information on syntax standards, see Language Documentation Syntax Notes.

col_ref

Name of the Datetime column whose values are used to determine sessions. 

  • Multiple columns and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString (column reference to Datetime values)myDates

k_integer

Defines the length of a session as this number of units, which are defined in the unit_of_time parameter.

NOTE: The start of a new session is determined by the first record that is found outside the boundary of the previous session. It is not determined based on any interpretation of a fixed interval.

Usage Notes:

Required?Data TypeExample Value
YesInteger (positive)24

unit_of_time

Defines the length of each unit of time for purposes of defining the length of a session.

Usage Notes:

Required?Data TypeExample Value
YesString literal. See below for list.hour

Supported values:  day hour millisecond minute second

Examples


Example - Assign session identifiers to timestamped events

The following dataset contains events from a web site, categorized by customer identifier. Timestamp information is stored in two separate columns, and the imported data is sorted by the Action column. Your goal is to generate session identifiers based on sessions of five minute intervals for each customer.

Source:

DateTimeCustIdAction
2/1/169:23:00 AMC001change account settings
2/1/169:23:58 AMC003complete order
2/1/169:20:00 AMC002login
2/1/169:20:22 AMC003login
2/1/169:20:41 AMC001login
2/1/169:24:52 AMC004login
2/1/1611:24:21 AMC001login
2/1/169:24:18 AMC001logout
2/1/169:24:49 AMC003logout
2/1/169:26:22 AMC002logout
2/1/169:24:10 AMC002search: bicycles
2/1/169:23:50 AMC002search: pennyfarthings
2/1/1611:56:09 PMC004search: unicycles

Transform:

This data makes more sense if it was organized by timestamp of each event. However, the timestamp information is spread across two fields: Date and Time. Your first step is to consolidate this data into a single field:

merge col: Date,Time with: ' ' as: 'Timestamp'

You can now drop the two source columns. After they are dropped, you might notice that the Timestamp column is still typed as String data. This typing issue is caused by the AM/PM designators, which you can remove with the following transform:

replace col: Timestamp on: ` {upper}{2}{end}` with: ''

Now that you have valid Datetime data, you can create session identifiers using the following transform:

window value:SESSION(Timestamp, 5, minute) order: Timestamp group: CustId

The above transform creates session identifiers from the data in the Timestamp column for five-minute intervals. Data is initially grouped by CustId and then sorted by Timestamp before the SESSION function is applied.

You can choose to rename the generated column:

rename col:window to:'SessionId'

Results:

TimestampCustIdActionSessionId
2/1/2016 11:24:21C001login1
2/1/2016 9:20:41C001login2
2/1/2016 9:23:00C001change account settings2
2/1/2016 9:24:18C001logout2
2/1/2016 9:20:22C003login1
2/1/2016 9:23:58C003complete order1
2/1/2016 9:24:49C003logout1
2/1/2016 9:20:00C002login1
2/1/2016 9:23:50C002search: pennyfarthings1
2/1/2016 9:24:10C002search: bicycles1
2/1/2016 9:26:22C002logout1
1/31/2016 11:56:09 C004search: unicycles1
2/1/2016 9:24:52C004login2

Notes:

  • Dataset is grouped by CustId, but the order of those groupings is determined by the first timestamp for each customer. So, C003 data appears before C002
  • C001 and C004 data result in two separate sessions. 

 

Your Rating: Results: PatheticBadOKGoodOutstanding! 2 rates

This page has no comments.