Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

Excerpt

Returns true if a value contains a string or pattern. The value to search can be a string literal, a function returning a string, or a reference to a column of String type.

...

Column reference example:

D code

delete row: MATCHES(ProdId, 'Fun Toy')

...

Output: Returns true when the value in the ProdId column value contains the string literal Fun Toy.

String literal example:

D code

derive type:single value: MATCHES('Hello, World', 'Hello')

...

Output: Returns true.

D s
snippetSyntax

D code

derive type:single value:MATCHES(column_string,string_pattern)


ArgumentRequired?Data TypeDescription
column_stringYstringName of column or string literal to be searched
string_patternYstringString Name of column, function returning a string ,or string literal or pattern to find

...

Required?Data TypeExample Value
YesStringMyColumn

string_pattern

String literal, Column of strings, function returning a string, or string literal. Value can be a string literal, 

D s item
itempattern
rtrue
, or regular expression to match against the source column-string.

...

log
2016-01-29T00:14:24.924Z com.example.hadoopdata.monitor.spark_runner.ProfilerServiceClient [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - Spark Profiler URL - http://localhost:4006/
2016-01-29T00:14:40.066Z com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - Spark process ID was null.
2016-01-29T00:14:40.067Z com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner [pool-13-thread-1] INFO com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - --------------------------------END SPARK JOB-------------------------------
2016-01-29T00:14:44.961Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] ERROR com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Job '128' threw an exception during execution
2016-01-29T00:14:44.962Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] INFO com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Making sure async worker is stopped
2016-01-29T00:14:44.962Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] INFO com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Notifying monitor for job '128', code 'FAILURE'
2016-01-29T00:14:44.988Z com.example.hadoopdata.monitor.client.MonitorClient [pool-4-thread-2] INFO com.example.hadoopdata.monitor.client.MonitorClient - Request succeeded to monitor ip-0-0-0-0.example.com:8001

TransformTransformation:

When the above data is loaded into the application, you might want to break up the data into separate columns, which splits them on the Z character at the end of the timestamp:

d-

...

trans
RawWrangletrue
p03Value`Z `
Typestep
WrangleTextsplit col: column1 on: `Z `
p01NameColumn
p01Valuecolumn1
p02NameOption
p02ValueOn pattern
p03NameMatch pattern
SearchTermSplit column

Then, you can rename the two columns: Timestamp and Log_Message. To filter out the INFO and WARNING messages, you can use the following transforms, which match on the string literals to identify these messages:

d-

...

trans
RawWrangletrue
p03Valuematches(Log_Message, '] INFO ')
Typestep
WrangleTextdelete row:

...

matches(Log_Message, '] INFO ')

...

p01NameCondition
p01ValueCustom formula
p02NameType of formula
p02ValueCustom single
p03NameCondition
p04ValueDelete matching rows
p04NameAction
SearchTermFilter rows

D trans
RawWrangletrue
p03Valuematches(Log_Message, '] WARNING ')
Typestep
WrangleTextdelete row: matches(Log_Message, '] WARNING ')
p01NameCondition
p01ValueCustom formula
p02NameType of formula
p02ValueCustom single
p03NameCondition
p04ValueDelete matching rows
p04NameAction
SearchTermFilter rows

Results:

After the above steps, the data should look like the following:

...