Returns |
Since the MATCHES
function returns a Boolean value, it can be used as both a function and as a conditional.
Tip: When you select values in a histogram for a column of Array type, the function that identifies the values on which to perform a transform is typically |
Tip: If you need the location of the matched string within the source, use the |
Column reference example:
matches(ProdId, 'Fun Toy') |
Output: Returns true
when the value in the ProdId
column value contains the string literal Fun Toy
.
String literal example:
matches('Hello, World', 'Hello') |
Output: Returns true
.
matches(column_string,string_pattern [,ignore_case]) |
Argument | Required? | Data Type | Description |
---|---|---|---|
column_string | Y | string | Name of column or string literal to be searched |
string_pattern | Y | string | Name of column, function returning a string ,or string literal or pattern to find |
ignore_case | N | string | When true , matching is case-insensitive. Default is false . |
Name of the column or string literal to be searched.
'Hello, World'
).matches([Col1,Col2],'hello'
).
Required? | Data Type | Example Value |
---|---|---|
Yes | String | MyColumn |
Column of strings, function returning a string, or string literal. Value can be a column reference, string literal, , or regular expression to match against the source column-string.
Required? | Data Type | Example Value |
---|---|---|
Yes | Column reference or string literal or pattern | 'home page' |
When true
, matches are case-insensitive. Default is false
.
NOTE: This argument is not required. By default, matches are case-sensitive. |
Required? | Data Type | Example Value |
---|---|---|
No | String literal evaluating to a Boolean | 'true'
|
In downloaded log files, you might see error messages of the following type:
INFO
- status information on the processWARNING
- system encountered a non-fatal error during executionERROR
- system encountered an error, which might have caused the job to fail.
For purposes of analysis, you might want to filter out the data for INFO
and WARNING
messages.
Source:
Here is example data from a log file of a failed job:
log |
---|
2016-01-29T00:14:24.924Z com.example.hadoopdata.monitor.spark_runner.ProfilerServiceClient [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - Spark Profiler URL - http://localhost:4006/ |
2016-01-29T00:14:40.066Z com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - Spark process ID was null. |
2016-01-29T00:14:40.067Z com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - --------------------------------END SPARK JOB------------------------------- |
2016-01-29T00:14:44.961Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] ERROR com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Job '128' threw an exception during execution |
2016-01-29T00:14:44.962Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] INFO com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Making sure async worker is stopped |
2016-01-29T00:14:44.962Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] INFO com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Notifying monitor for job '128', code 'FAILURE' |
2016-01-29T00:14:44.988Z com.example.hadoopdata.monitor.client.MonitorClient [pool-4-thread-2] INFO com.example.hadoopdata.monitor.client.MonitorClient - Request succeeded to monitor ip-0-0-0-0.example.com:8001 |
Transformation:
When the above data is loaded into the application, you might want to break up the data into separate columns, which splits them on the Z
character at the end of the timestamp:
Then, you can rename the two columns: Timestamp
and Log_Message
. To filter out the INFO
and WARNING
messages, you can use the following transforms, which match on the string literals to identify these messages:
Results:
After the above steps, the data should look like the following:
Timestamp | Log_Message |
---|---|
2016-01-29T00:14:44.961 | com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] ERROR com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Job '128' threw an exception during execution |