Page tree

 

Support | BlogContact Us | 844.332.2821

 

Contents:

This documentation applies to Trifacta Wrangler. Download this free product.
Registered users of this product or Trifacta Wrangler Enterprise should login to Product Docs through the application.

Finds the suffix value after the domain from a valid URL. Input values must be of URL or String type.This function is part of a set of functions for processing URL data.

 

Basic Usage

URL literal examples:

derive value: SUFFIX('http://www.example.com' ) as: 'myDomain'

Output: Generates a column containing the value com

derive value: SUFFIX('http://www.exampl.e.com' ) as: 'myDomain'

Output: Generates a column containing the value com

Column reference example:

derive value: SUFFIX(myURLs) as: 'myDomain'

Output: Generates the new myDomain column containing the suffix values extracted from the myURLs column. 

Syntax

derive value: SUFFIX(column_url)

ArgumentRequired?Data TypeDescription
column_urlYstringName of column or String or URL literal containing the suffix value to extract

For more information on syntax standards, see Language Documentation Syntax Notes.

column_url

Name of the column or URL or String literal whose values are used to extract the suffix value.

  • Missing input values generate missing results.
  • Multiple columns and wildcards are not supported.

Usage Notes:

 

Required?Data TypeExample Value
YesString literal or column reference (URL)http://www.example.com

Examples

Example - Domain, Subdomain, and Suffix functions

This examples illustrates how you can extract component parts of a URL using the following functions:

  • DOMAIN - extracts the domain value from a URL. See DOMAIN Function.
  • SUBDOMAIN - extracts the first group after the protocol identifier and before the domain value. See SUBDOMAIN Function.
  • SUFFIX - extracts the suffix of a URL. See SUFFIX Function.

Source:

Your dataset includes the following values for URLs:

URL
www.example.com
example.com/support
http://www.example.com/products/
http://1.2.3.4
https://www.example.com/free-download
https://www.example.com/about-us/careers
www.app.example.com
www.some.app.example.com
some.app.example.com
some.example.com
example.com

Transform:

When the above data is imported into the application, the column is recognized as a URL. All values are registered as valid, even the IPv4 address. 

To extract the domain and subdomain values:

derive value: DOMAIN(URL) as: 'domain_URL'

derive value: SUBDOMAIN(URL) as:'subdomain_URL'

derive value: SUFFIX(URL) as:'suffix_URL'

You can use the Trifacta® pattern in the following transform to extract protocol identifiers, if present, into a new column: 

extract col:URL on:`{start}%*://`

To clean this up, you might want to rename the column to protocol_URL.

To extract the path values, you can use the following regular expression:

NOTE: Regular expressions are considered a developer-level method for pattern matching. Please use them with caution. See Text Matching.

extract col: URL on: /[^*:\/\/]\/.*$/

The above transform grabs a little too much of the URL. If you rename the column to path_URL, you can use the following regular expression to clean it up:

extract col:path_URL on:/[!^\/].*$/


If you drop the path_URL column and rename the path_URL1 column to the dropped one, you should have the following results:

Results:

URLpath_URLprotocol_URLsubdomain_URLdomain_URLsuffix_URL
www.example.com  wwwexamplecom
example.com/support/support  examplecom
http://www.example.com/products//products/http://wwwexamplecom
http://1.2.3.4 http://   
https://www.example.com/free-download/free-downloadhttps://wwwexamplecom
https://www.example.com/about-us/careers/about-us/careershttps://wwwexamplecom
www.app.example.com  www.appexamplecom
www.some.app.example.com  www.some.appexamplecom
some.app.example.com  some.appexamplecom
some.example.com  someexamplecom
example.com   examplecom

 

Your Rating: Results: PatheticBadOKGoodOutstanding! 23 rates

This page has no comments.