Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

D toc

Excerpt

Returns true if the first parameter is contained in the array of values in the second parameter.

  • The value to match can be a literal or a reference to a column.
  • The second parameter must be in array format. 

Since the IN function returns a Boolean value, it can be used as a function or a conditional.

Tip

Tip: When you select values in a histogram for a column of String type, the function that identifies the values on which to perform a transform is typically IN.

Tip

Tip: If you need the location of the matched value within the source, use the FIND function. See FIND Function.

D s lang vs sql

D s
snippetBasic

D lang syntax
RawWrangletrue
Typeref
showNotetrue
WrangleTextkeep row: in(brand, ['discount','mid','high-end'])

in(brand, ['discount','mid','high-end'])

Output: Returns true if the value in the brand column is either discountmid, or high-end.

D s
snippetSyntax

D lang syntax
RawWrangletrue
Typesyntax
showNotetrue
WrangleTextderive type:single value:in(column_string, values_array)

in(column_string, values_array)


ArgumentRequired?Data TypeDescription
column_stringYstringName of column or literal to locate in the column specified in the second parameter
values_arrayYarray literalArray literal of values to search

D s lang notes

column_string

Name of the column or literal to find in the second parameter.

  • Missing values generate missing string results.
    • String constants must be quoted ('Hello, World').
  • Multiple columns and wildcards are not supported.

D s
snippetusage

Required?Data TypeExample Value
YesColumn reference or any valuemyColumn

values_array

Array of values to search for the first parameter.

  • Column references are not supported.

D s
snippetusage

Required?Data TypeExample Value
YesArray literal'Steve'

D s
snippetExamples

Example - Replace T-shirt color

Source:

You have the following source data on your products:

ProdIdProductNameColorBrand
P001T-shirtwhitediscount
P002pantsbeigediscount
P003hatblackdiscount
P004T-shirtwhitemid
P005pantsblackmid
P006hatredmid
P007T-shirtwhitehigh-end
P008pantswhitehigh-end
P009hatbluehigh-end

In the data, you notice an error. For the discount and mid brands, T-shirt color should be orange. You need to fix it.

Transformation:

In the Transformer page, you select the white value from the histogram at the top of the Color column. Among the suggestion cards, select the Set transform. For the first variant, all values are missing. Click Modify. The current transform is the following:

D trans
RawWrangletrue
p03ValueColor == 'white'
Typestep
WrangleTextset col: Color value: null() row: Color == 'white'
p01NameColumns
p01ValueColor
p02NameFormula
p02Valuenull()
p03NameGroup rows by
SearchTermEdit column with formula

In the Preview, you can see that this transform matches all white values in the column and replaces them with a null value. Since the replacement value is orange, you can edit the transform so it looks like the following:

D trans
RawWrangletrue
p03ValueColor == 'white'
Typestep
WrangleTextset col: Color value: 'orange' row: Color == 'white'
p01NameColumns
p01ValueColor
p02NameFormula
p02Value'orange'
p03NameGroup rows by
SearchTermEdit column with formula

This step looks better. However, it is replacing all instances of white, including those for white pants (P008) and high-end T-shirts (p007), which should not be replaced. To fix, you must add conditions to the row expression. First, add the following, which ensures that the transform only replaces for T-shirts:

D trans
RawWrangletrue
p03Value(Color == 'white' && ProductName == 'T-shirt')
Typestep
WrangleTextset col: Color value: 'orange' row: (Color == 'white' && ProductName == 'T-shirt')
p01NameColumns
p01ValueColor
p02NameFormula
p02Value'orange'
p03NameGroup rows by
SearchTermEdit column with formula

Now, the Preview shows that only T-shirt values are being changed. The transform needs to be further modified to restrict only to the appropriate brands (discount and mid):

D trans
RawWrangletrue
p03Value(Color == 'white' && ProductName == 'T-shirt' && IN(Brand, ["discount","mid"]))
Typestep
WrangleTextset col: Color value: 'orange' row: (Color == 'white' && ProductName == 'T-shirt' && IN(Brand, ["discount","mid"]))
p01NameColumns
p01ValueColor
p02NameFormula
p02Value'orange'
p03NameGroup rows by
SearchTermEdit column with formula

Info

NOTE: It's possible to specify the brand restriction as (Brand <> 'high-end'). However, if there are other brand values in the full dataset, this restriction fails.

Results:

ProdIdProductNameColorBrand
P001T-shirtorangediscount
P002pantsbeigediscount
P003hatblackdiscount
P004T-shirtorangemid
P005pantsblackmid
P006hatredmid
P007T-shirtwhitehigh-end
P008pantswhitehigh-end
P009hatbluehigh-end

D s also
labelcomparison