Excerpt |
---|
Returns true if a value contains a string or pattern. The value to search can be a string literal, a function returning a string, or a reference to a column of String type. |
Since the MATCHES
function returns a Boolean value, it can be used as both a function and as a conditional.
Tip |
---|
Tip: When you select values in a histogram for a column of Array type, the function that identifies the values on which to perform a transform is typically MATCHES . |
Tip |
---|
Tip: If you need the location of the matched string within the source, use the FIND function. See FIND Function. |
Column reference example:
D lang syntax |
---|
RawWrangle | true |
---|
Type | ref |
---|
showNote | true |
---|
WrangleText | delete row: matches(ProdId, 'Fun Toy') |
---|
|
matches(ProdId, 'Fun Toy') |
Output: Returns true
when the value in the ProdId
column value contains the string literal Fun Toy
.
String literal example:
D lang syntax |
---|
RawWrangle | true |
---|
Type | ref |
---|
showNote | true |
---|
WrangleText | derive type:single value: matches('Hello, World', 'Hello') |
---|
|
matches('Hello, World', 'Hello') |
Output: Returns true
.
D lang syntax |
---|
RawWrangle | true |
---|
Type | syntax |
---|
showNote | true |
---|
WrangleText | derive type:single value:matches(column_string,string_pattern) |
---|
|
matches(column_string,string_pattern) |
Argument | Required? | Data Type | Description |
---|
column_string | Y | string | Name of column or string literal to be searched |
string_pattern | Y | string | Name of column, function returning a string ,or string literal or pattern to find |
column_string
Name of the column or string literal to be searched.
- Missing string or column values generate missing string results.
- String constants must be quoted (
'Hello, World'
).
- Multiple columns can be specified as an array (
matches([Col1,Col2],'hello'
).
Required? | Data Type | Example Value |
---|
Yes | String | MyColumn |
string_pattern
Column of strings, function returning a string, or string literal. Value can be a column reference, string literal,
, or regular expression to match against the source column-string.
Required? | Data Type | Example Value |
---|
Yes | Column reference or string literal or pattern | 'home page' |
Example - Filtering log data
In downloaded log files, you might see error messages of the following type:
INFO
- status information on the processWARNING
- system encountered a non-fatal error during executionERROR
- system encountered an error, which might have caused the job to fail.
For purposes of analysis, you might want to filter out the data for INFO
and WARNING
messages.
Source:
Here is example data from a log file of a failed job:
log |
---|
2016-01-29T00:14:24.924Z com.example.hadoopdata.monitor.spark_runner.ProfilerServiceClient [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - Spark Profiler URL - http://localhost:4006/ |
2016-01-29T00:14:40.066Z com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - Spark process ID was null. |
2016-01-29T00:14:40.067Z com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - --------------------------------END SPARK JOB------------------------------- |
2016-01-29T00:14:44.961Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] ERROR com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Job '128' threw an exception during execution |
2016-01-29T00:14:44.962Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] INFO com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Making sure async worker is stopped |
2016-01-29T00:14:44.962Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] INFO com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Notifying monitor for job '128', code 'FAILURE' |
2016-01-29T00:14:44.988Z com.example.hadoopdata.monitor.client.MonitorClient [pool-4-thread-2] INFO com.example.hadoopdata.monitor.client.MonitorClient - Request succeeded to monitor ip-0-0-0-0.example.com:8001 |
Transformation:
When the above data is loaded into the application, you might want to break up the data into separate columns, which splits them on the Z
character at the end of the timestamp:
D trans |
---|
RawWrangle | true |
---|
p03Value | `Z ` |
---|
Type | step |
---|
WrangleText | split col: column1 on: `Z ` |
---|
p01Name | Column |
---|
p01Value | column1 |
---|
p02Name | Option |
---|
p02Value | On pattern |
---|
p03Name | Match pattern |
---|
SearchTerm | Split column |
---|
|
Then, you can rename the two columns: Timestamp
and Log_Message
. To filter out the INFO
and WARNING
messages, you can use the following transforms, which match on the string literals to identify these messages:
D trans |
---|
RawWrangle | true |
---|
p03Value | matches(Log_Message, '] INFO ') |
---|
Type | step |
---|
WrangleText | delete row: matches(Log_Message, '] INFO ') |
---|
p01Name | Condition |
---|
p01Value | Custom formula |
---|
p02Name | Type of formula |
---|
p02Value | Custom single |
---|
p03Name | Condition |
---|
p04Value | Delete matching rows |
---|
p04Name | Action |
---|
SearchTerm | Filter rows |
---|
|
D trans |
---|
RawWrangle | true |
---|
p03Value | matches(Log_Message, '] WARNING ') |
---|
Type | step |
---|
WrangleText | delete row: matches(Log_Message, '] WARNING ') |
---|
p01Name | Condition |
---|
p01Value | Custom formula |
---|
p02Name | Type of formula |
---|
p02Value | Custom single |
---|
p03Name | Condition |
---|
p04Value | Delete matching rows |
---|
p04Name | Action |
---|
SearchTerm | Filter rows |
---|
|
Results:
After the above steps, the data should look like the following:
Timestamp | Log_Message |
---|
2016-01-29T00:14:44.961 | com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] ERROR com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Job '128' threw an exception during execution |