MATCHES Function
Returns true
if a value contains a string or pattern. The value to search can be a string literal, a function returning a string, or a reference to a column of String type. You can also search an array of columns.
Since the MATCHES
function returns a Boolean value, it can be used as both a function and as a conditional.
Astuce
When you select values in a histogram for a column of Array type, the function that identifies the values on which to perform a transform is typically MATCHES
.
Astuce
If you need the location of the matched string within the source, use the FIND
function. See FIND Function.
Wrangle vs. SQL: This function is part of Wrangle, a proprietary data transformation language. Wrangle is not SQL. For more information, see Wrangle Language.
Basic Usage
Column reference example:
matches(ProdId, 'Fun Toy')
Output: Returns true
when the value in the ProdId
column value contains the string literal Fun Toy
.
matches([ProdId,ProdName], 'Fun Toy')
Output: Returns true
when the value in the ProdId
or ProdName
column values contain the string literal Fun Toy
.
String literal example:
matches('Hello, World', 'Hello')
Output: Returns true
.
Syntax and Arguments
matches(column_string,string_pattern <span>[,ignore_case]</span>)
Argument | Required? | Data Type | Description |
---|---|---|---|
column_string | Y | string | Name of column or string literal to be searched. This value can also be an array of column names to search, combined in a logical OR. |
string_pattern | Y | string | Name of column, function returning a string ,or string literal or pattern to find |
ignore_case | N | string | When |
For more information on syntax standards, see Language Documentation Syntax Notes.
column_string
Name of the column or string literal to be searched.
Missing string or column values generate missing string results.
String constants must be quoted (
'Hello, World'
).
Multiple columns can be specified as an array (
matches([Col1,Col2],'hello'
).
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
Yes | String | MyColumn |
string_pattern
Column of strings, function returning a string, or string literal. Value can be a column reference, string literal, Wrangle , or regular expression to match against the source column-string.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
Yes | Column reference or string literal or pattern | 'home page' |
ignore_case
When true
, matches are case-insensitive. Default is false
.
Note
This argument is not required. By default, matches are case-sensitive.
Required? | Data Type | Example Value |
---|---|---|
No | String literal evaluating to a Boolean | 'true' |
Examples
Astuce
For additional examples, see Common Tasks.
Example - Filtering log data
In downloaded log files, you might see error messages of the following type:
INFO
- status information on the processWARNING
- system encountered a non-fatal error during executionERROR
- system encountered an error, which might have caused the job to fail.
For purposes of analysis, you might want to filter out the data for INFO
and WARNING
messages.
Source:
Here is example data from a log file of a failed job:
log |
---|
2016-01-29T00:14:24.924Z com.example.hadoopdata.monitor.spark_runner.ProfilerServiceClient [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - Spark Profiler URL - http://localhost:4006/ |
2016-01-29T00:14:40.066Z com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - Spark process ID was null. |
2016-01-29T00:14:40.067Z com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - --------------------------------END SPARK JOB------------------------------- |
2016-01-29T00:14:44.961Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] ERROR com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Job '128' threw an exception during execution |
2016-01-29T00:14:44.962Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] INFO com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Making sure async worker is stopped |
2016-01-29T00:14:44.962Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] INFO com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Notifying monitor for job '128', code 'FAILURE' |
2016-01-29T00:14:44.988Z com.example.hadoopdata.monitor.client.MonitorClient [pool-4-thread-2] INFO com.example.hadoopdata.monitor.client.MonitorClient - Request succeeded to monitor ip-0-0-0-0.example.com:8001 |
Transformation:
When the above data is loaded into the application, you might want to break up the data into separate columns, which splits them on the Z
character at the end of the timestamp:
Transformation Name | |
---|---|
Parameter: Column | column1 |
Parameter: Option | On pattern |
Parameter: Match pattern | `Z ` |
Then, you can rename the two columns: Timestamp
and Log_Message
. To filter out the INFO
and WARNING
messages, you can use the following transforms, which match on the string literals to identify these messages:
Transformation Name | |
---|---|
Parameter: Condition | Custom formula |
Parameter: Type of formula | Custom single |
Parameter: Condition | matches(Log_Message, '] INFO ') |
Parameter: Action | Delete matching rows |
Transformation Name | |
---|---|
Parameter: Condition | Custom formula |
Parameter: Type of formula | Custom single |
Parameter: Condition | matches(Log_Message, '] WARNING ') |
Parameter: Action | Delete matching rows |
Results:
After the above steps, the data should look like the following:
Timestamp | Log_Message |
---|---|
2016-01-29T00:14:44.961 | com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] ERROR com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Job '128' threw an exception during execution |