Page tree

Trifacta Dataprep


Contents:

On January 27, 2021, Google is changing the required permissions for attaching IAM roles to service accounts. If you are using IAM roles for your Google service accounts, please see Changes to User Management.

 

Contents:


Returns true if a value contains a string or pattern. The value to search can be a string literal, a function returning a string, or a reference to a column of String type.

Since the MATCHES function returns a Boolean value, it can be used as both a function and as a conditional.

Tip: When you select values in a histogram for a column of Array type, the function that identifies the values on which to perform a transform is typically MATCHES.

Tip: If you need the location of the matched string within the source, use the FIND function. See FIND Function.

Wrangle vs. SQL: This function is part of Wrangle , a proprietary data transformation language. Wrangle is not SQL. For more information, see Wrangle Language.

Basic Usage

Column reference example:

matches(ProdId, 'Fun Toy')

Output: Returns true when the value in the ProdId column value contains the string literal Fun Toy.

String literal example:

matches('Hello, World', 'Hello')

Output: Returns true.

Syntax and Arguments

matches(column_string,string_pattern [,ignore_case])


ArgumentRequired?Data TypeDescription
column_stringYstringName of column or string literal to be searched
string_patternYstringName of column, function returning a string ,or string literal or pattern to find
ignore_caseNstringWhen true, matching is case-insensitive. Default is false.

For more information on syntax standards, see Language Documentation Syntax Notes.

column_string

Name of the column or string literal to be searched.

  • Missing string or column values generate missing string results.
    • String constants must be quoted ('Hello, World').
  • Multiple columns can be specified as an array ( matches([Col1,Col2],'hello').

Usage Notes:

Required?Data TypeExample Value
YesStringMyColumn

string_pattern

Column of strings, function returning a string, or string literal. Value can be a column reference, string literal,  Pattern , or regular expression to match against the source column-string.

Usage Notes:

Required?Data TypeExample Value
YesColumn reference or string literal or pattern'home page'

ignore_case

When true, matches are case-insensitive. Default is false

NOTE: This argument is not required. By default, matches are case-sensitive.

Required?Data TypeExample Value
NoString literal evaluating to a Boolean'true'


Examples

Tip: For additional examples, see Common Tasks.

Example - Filtering log data

In downloaded log files, you might see error messages of the following type:

  • INFO - status information on the process
  • WARNING - system encountered a non-fatal error during execution
  • ERROR - system encountered an error, which might have caused the job to fail.


For purposes of analysis, you might want to filter out the data for INFO and WARNING messages.

Source:

Here is example data from a log file of a failed job:

log
2016-01-29T00:14:24.924Z com.example.hadoopdata.monitor.spark_runner.ProfilerServiceClient [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - Spark Profiler URL - http://localhost:4006/
2016-01-29T00:14:40.066Z com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - Spark process ID was null.
2016-01-29T00:14:40.067Z com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - --------------------------------END SPARK JOB-------------------------------
2016-01-29T00:14:44.961Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] ERROR com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Job '128' threw an exception during execution
2016-01-29T00:14:44.962Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] INFO com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Making sure async worker is stopped
2016-01-29T00:14:44.962Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] INFO com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Notifying monitor for job '128', code 'FAILURE'
2016-01-29T00:14:44.988Z com.example.hadoopdata.monitor.client.MonitorClient [pool-4-thread-2] INFO com.example.hadoopdata.monitor.client.MonitorClient - Request succeeded to monitor ip-0-0-0-0.example.com:8001

Transformation:

When the above data is loaded into the application, you might want to break up the data into separate columns, which splits them on the Z character at the end of the timestamp:

Transformation Name Split column
Parameter: Column column1
Parameter: Option On pattern
Parameter: Match pattern `Z `

Then, you can rename the two columns: Timestamp and Log_Message. To filter out the INFO and WARNING messages, you can use the following transforms, which match on the string literals to identify these messages:

Transformation Name Filter rows
Parameter: Condition Custom formula
Parameter: Type of formula Custom single
Parameter: Condition matches(Log_Message, '] INFO ')
Parameter: Action Delete matching rows

Transformation Name Filter rows
Parameter: Condition Custom formula
Parameter: Type of formula Custom single
Parameter: Condition matches(Log_Message, '] WARNING ')
Parameter: Action Delete matching rows

Results:

After the above steps, the data should look like the following:

TimestampLog_Message
2016-01-29T00:14:44.961com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] ERROR com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Job '128' threw an exception during execution

This page has no comments.