Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

D toc

Returns true if a value contains a string or pattern. The value to search can be a string literal or a reference to a column of String type.

Since the MATCHES function returns a Boolean value, it can be used as both a function and as a conditional.


Tip: When you select values in a histogram for a column of Array type, the function that identifies the values on which to perform a transform is typically MATCHES.


Tip: If you need the location of the matched string within the source, use the FIND function. See FIND Function.

D s

Column reference example:

D code

delete row: MATCHES(ProdId, 'Fun Toy')

Output: Deletes any row in which the value in the ProdId column value contains the string literal Fun Toy

String literal example:

D code

derive type:single value: MATCHES('Hello, World', 'Hello')

Output: For all values in the dataset returns true.

D s

D code

derive type:single value:MATCHES(column_string,string_pattern)

ArgumentRequired?Data TypeDescription
column_string YstringName of column or string literal to be searched 
string_patternYstringString literal or pattern to find

D s lang notes


Name of the column or string literal to be searched.

  • Missing string or column values generate missing string results.
    • String constants must be quoted ('Hello, World').
  • Multiple columns can be specified as an array ( matches([Col1,Col2],'hello').

D s

Required?Data TypeExample Value


String literal,

D s item
, or regular expression to match against the source column-string.

  • Column references are not supported.

D s

Required?Data TypeExample Value
YesString literal or pattern'home page'

D s

Example - Filtering log data

When the feature is enabled, you can download the log files for any job that fails to execute, which can assist in debugging issues related to the dataset, recipe, or job execution. In the downloaded logs, you might see error messages of the following type:

  • INFO - status information on the process
  • WARNING - system encountered a non-fatal error during execution
  • ERROR - system encountered an error, which might have caused the job to fail.

For purposes of analysis, you might want to filter out the data for INFO and WARNING messages.


Here is example data from a log file of a failed job: 

2016-01-29T00:14:24.924Z com.example.hadoopdata.monitor.spark_runner.ProfilerServiceClient [pool-13-thread-1] INFO  com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - Spark Profiler URL - http://localhost:4006/
2016-01-29T00:14:40.066Z com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner [pool-13-thread-1] INFO  com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - Spark process ID was null.
2016-01-29T00:14:40.067Z com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner [pool-13-thread-1] INFO  com.example.hadoopdata.monitor.spark_runner.BatchProfileSparkRunner - --------------------------------END SPARK JOB-------------------------------
2016-01-29T00:14:44.961Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] ERROR com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Job '128' threw an exception during execution
2016-01-29T00:14:44.962Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] INFO  com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Making sure async worker is stopped
2016-01-29T00:14:44.962Z com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] INFO  com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Notifying monitor for job '128', code 'FAILURE'
2016-01-29T00:14:44.988Z com.example.hadoopdata.monitor.client.MonitorClient [pool-4-thread-2] INFO  com.example.hadoopdata.monitor.client.MonitorClient - Request succeeded to monitor


When the above data is loaded into the application, you might want to break up the data into separate columns, which splits them on the Z character at the end of the timestamp:

D code

split col: column1 on: `Z `

Then, you can rename the two columns: Timestamp and Log_Message. To filter out the INFO and WARNING messages, you can use the following transforms, which match on the string literals to identify these messages:

D code

delete row: MATCHES(Log_Message, '] INFO  ')

D code

delete row: MATCHES(Log_Message, '] WARNING ')


After the above steps, the data should look like the following:

2016-01-29T00:14:44.961com.example.hadoopdata.joblaunch.server.BatchPollingWorker [pool-4-thread-2] ERROR com.example.hadoopdata.joblaunch.server.BatchPollingWorker - Job '128' threw an exception during execution

D s also