Page tree

Release 5.0.1


Contents:

   

Contents:


The IFMISMATCHED function writes out a specified value if the input expression does not match the specified data type or typing array. Otherwise, it writes the source value. Input can be a literal, a column reference, or a function. 

The ISMISMATCHED function simply tests if a value is mismatched. See ISMISMATCHED Function.

Basic Usage

Basic data type:

derive type:single value:IFMISMATCHED(my_ssn,'SSN', 'XXX-XX-XXXX') as:'final_SSN'

Output: Generates a new column called, final_SSN, which contains the value XXX-XX-XXXX if the value in my_ssn does not match the SSN data type. 

Data type with formatting options:

For data types with formatting options, such as Datetime, you can specify the format using an array, as in the following:

set col: month_Date value: IFMISMATCHED(month_Date, ['Datetime','mm-dd-yy','mm*dd*yy'], null())

Output: Sets values in the month_Date column to be null if they are mismatched against Datetime values in the mm-dd-yy or mm*dd*yy formats.

Syntax and Arguments

derive type:single value:IFMISMATCHED(column_string, data_type_literal, computed_value)

ArgumentRequired?Data TypeDescription
source_valueYstringName of column, string literal or function to be tested
datatype_literalYstringString literal or array that identifies the data type against which to validate the source values
output_valueystringString literal value to write

For more information on syntax standards, see Language Documentation Syntax Notes.

source_value

Name of the column, string literal, or function to be tested for data type matches.

  • Missing literals or column values generate missing string results.
  • Multiple columns and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString literal, column reference, or functionmyColumn

datatype_literal

Literal value or string array that identifies data type to which to validate the source column or string.

  • Column references are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString literal'Integer'

Valid data type strings:

When referencing a data type within a transform, you can use the following strings to identify each type:

NOTE: In Wrangle transforms, these values are case-sensitive.

Data TypeString
String'String'
Integer'Integer'
Decimal'Float'
Boolean'Bool'
Social Security Number'SSN'
Phone Number'Phone'
Email Address'Emailaddress'
Credit Card'Creditcard'
Gender'Gender'
Object'Map'
Array'Array'
IP Address'Ipaddress'
URL'Url'
HTTP Code'Httpcodes'
Zip Code'Zipcode'
State'State'
Date / Time'Datetime'

For custom types, you should reference the name of the type in the string value. For more information, see Create Custom Data Types.

output_value

The output value to write if the tested value is mismatched for the specified data type.

Usage Notes:

Required?Data TypeExample Value
YesString or numeric literal'Data type mismatch'

Examples


Tip: For additional examples, see Common Tasks.

Example - IF* functions for data type validation

This section provides simple examples for how to use the IF* functions for data type validation. These functions include the following:
  • IFNULL - For an input expression or value, this function returns the specified value if the input is a null value. See IFNULL Function.
  • IFMISSING - Returns the specified value if the input value or expression is a missing value. See IFMISSING Function.
  • IFMISMATCHED - Returns the specified value if the input value or expression is mismatched against the column's data type. See IFMISMATCHED Function.
  • IFVALID - Returns the specified value if the input value or expression is valid against the column's data type. See IFVALID Function

Source:

The following simple table lists zip codes by customer identifier:

custIdcustZip
C00198123
C00294105
C00312415
C00412451-2234
C00512441-298
C006 
C007 
C0081242
C0091104


Transform:

When the above is imported into the Transformer page, you notice the following:

  • The custZip column is typed as Integer.
  • There are two missing and two mismatched values in the custZip column.

First, you test for valid values in the custZip column. Using the IFVALID function, you can validate against any data type:

derive type:single value:IFVALID(custZip, 'Zipcode', 'ok') as:'status'

Fix four-digit zips: In the status column are instances of ok for the top four rows. You notice that the bottom two rows contain four-digit codes.

Since the custZip values were originally imported as Integer, any leading 0 values are dropped. In this case, you can add back the leading zero. Before the previous step, change the data type of zip to String and insert the following:

derive type:single value:IF(LEN(custZip)==4,'0','') as:'FourDigitZip'

derive type:single value: merge([FourDigitZip,custZip]) as:'custZip2'

set col:zip value:custZip2

drop col:FourDigitZip,custZip2

Now, when you click the last recipe step, you should see that two more rows in status are listed as Ok.

For the zip code with the three-digit extension, you can simply remove that extension to make it valid. Click the step above the last one. In the data grid, highlight the value. Click the Replace suggestion card. Select the option that uses the following for the matching pattern:

'-{digit}{3}{end}'

The above means that all three-digit extensions are dropped from the zip. You can do the same for any two- and one-digit extensions, although there are none in this sample. 

Missing and null values: Now, you need to address how to handle missing and null values.  The IFMISSING tests for both missing and null values, while the IFNULL tests just for null values. In this example, you want to delete null values, which could mean that the data for that row is malformed and to write a status of missing for missing values.

Click above the last line in the recipe to insert the following:

set col:custZip value:IFNULL(custZip, 'xxxxx')

set col:custZip value:IFMISSING(custZip, '00000')

Now, when you click the last line of the recipe, only the null value is listed as having a status other than ok. You can use the following to remove this row and all like it:

delete row:(status == 'xxxxx')


Results:

custIdcustZipstatus
C00198123ok
C00294105ok
C00312415ok
C00412451-2234ok
C00512441-298ok
C006 00000ok
C0081242ok
C0091104ok

As an exercise, you might repeat the above steps starting with the IFMISMATCHED function determining the value in the status column:

derive type:single value:IFMISMATCHED(custZip, 'Zipcode', 'mismatched') as:'status'

 

 

This page has no comments.