Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r092
Excerpt

This example illustrates how to use the IF* functions for data type validation.

Functions:

D generate list excerpts
pagesIFNULL Function,IFMISSING Function,IFMISMATCHED Function,IFVALID Function,MERGE Function

Source:

The following simple table lists zip codes by customer identifier:

custIdcustZip
C00198123
C00294105
C00312415
C00412451-2234
C00512441-298
C006 
C007 
C0081242
C0091104


Transformation:

When the above is imported into the Transformer page, you notice the following:

  • The custZip column is typed as Integer.
  • There are two missing and two mismatched values in the custZip column.

First, you test for valid values in the custZip column. Using the IFVALID function, you can validate against any data type:

D trans
RawWrangletrue
p03Value'status'
Typestep
WrangleTextderive type:single value:IFVALID(custZip, 'Zipcode', 'ok') as:'status'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueIFVALID(custZip, 'Zipcode', 'ok')
p03NameNew column name
SearchTermNew formula

Fix four-digit zips: In the status column are instances of ok for the top four rows. You notice that the bottom two rows contain four-digit codes.

Since the custZip values were originally imported as Integer, any leading 0 values are deleted. In this case, you can add back the leading zero. Before the previous step, change the data type of zip to String and insert the following:

D trans
RawWrangletrue
p03Value'FourDigitZip'
Typestep
WrangleTextderive type:single value:IF(LEN(custZip)==4,'0','') as:'FourDigitZip'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueIF(LEN(custZip)==4,'0','')
p03NameNew column name
SearchTermNew formula

D trans
RawWrangletrue
p03Value'custZip2'
Typestep
WrangleTextderive type:single value: merge([FourDigitZip,custZip]) as:'custZip2'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueMERGE([FourDigitZip,custZip])
p03NameNew column name
SearchTermNew formula

D trans
RawWrangletrue
Typestep
WrangleTextset col:zip value:custZip2
p01NameColumns
p01Valuezip
p02NameFormula
p02ValuecustZip2
SearchTermEdit column with formula

D trans
RawWrangletrue
Typestep
WrangleTextdrop col:FourDigitZip,custZip2
p01NameColumns
p01ValueFourDigitZip,custZip2
p02NameAction
p02ValueDelete selected columns
SearchTermDelete columns

Now, when you click the last recipe step, you should see that two more rows in status are listed as Ok.

For the zip code with the three-digit extension, you can simply remove that extension to make it valid. Click the step above the last one. In the data grid, highlight the value. Click the Replace suggestion card. Select the option that uses the following for the matching pattern:

Code Block
'-{digit}{3}{end}'

The above means that all three-digit extensions are deleted from the zip. You can do the same for any two- and one-digit extensions, although there are none in this sample. 

Missing and null values: Now, you need to address how to handle missing and null values.  The IFMISSING tests for both missing and null values, while the IFNULL tests just for null values. In this example, you want to delete null values, which could mean that the data for that row is malformed and to write a status of missing for missing values.

Click above the last line in the recipe to insert the following:

D trans
RawWrangletrue
Typestep
WrangleTextset col:custZip value:IFNULL(custZip, 'xxxxx')
p01NameColumns
p01ValuecustZip
p02NameFormula
p02ValueIFNULL(custZip, 'xxxxx')
SearchTermEdit column with formula

D trans
RawWrangletrue
Typestep
WrangleTextset col:custZip value:IFMISSING(custZip, '00000')
p01NameColumns
p01ValuecustZip
p02NameFormula
p02ValueIFMISSING(custZip, '00000')
SearchTermEdit column with formula

Now, when you click the last line of the recipe, only the null value is listed as having a status other than ok. You can use the following to remove this row and all like it:

D trans
RawWrangletrue
p03Value(status == 'xxxxx')
Typestep
WrangleTextdelete row:(status == 'xxxxx')
p01NameCondition
p01ValueCustom formula
p02NameType of formula
p02ValueCustom single
p03NameCondition
p04ValueDelete matching rows
p04NameAction
SearchTermFilter rows


Results:

custIdcustZipstatus
C00198123ok
C00294105ok
C00312415ok
C00412451-2234ok
C00512441-298ok
C006 00000ok
C0081242ok
C0091104ok

As an exercise, you might repeat the above steps starting with the IFMISMATCHED function determining the value in the status column:

D trans
RawWrangletrue
p03Value'status'
Typestep
WrangleTextderive type:single value:IFMISMATCHED(custZip, 'Zipcode', 'mismatched') as:'status'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueIFMISMATCHED(custZip, 'Zipcode', 'mismatched')
p03NameNew column name
SearchTermNew formula

D s also
labelexample_if_data_type_validation_functions