Skip to main content

IN Function

Returns true if the first parameter is contained in the array of values in the second parameter.

  • The value to match can be a literal or a reference to a column.

  • The second parameter must be in array format.

Since the IN function returns a Boolean value, it can be used as a function or a conditional.

Astuce

When you select values in a histogram for a column of String type, the function that identifies the values on which to perform a transform is typically IN.

Astuce

If you need the location of the matched value within the source, use the FIND function. See FIND Function.

Wrangle vs. SQL: This function is part of Wrangle, a proprietary data transformation language. Wrangle is not SQL. For more information, see Wrangle Language.

Basic Usage

in(brand, ['discount','mid','high-end'])

Output: Returns true if the value in the brand column is either discount, mid, or high-end.

Syntax and Arguments

in(column_string, values_array)

Argument

Required?

Data Type

Description

column_string

Y

string

Name of column or literal to locate in the column specified in the second parameter

values_array

Y

array literal

Array literal of values to search

For more information on syntax standards, see Language Documentation Syntax Notes.

column_string

Name of the column or literal to find in the second parameter.

  • Missing values generate missing string results.

    • String constants must be quoted ('Hello, World').

  • Multiple columns and wildcards are not supported.

Usage Notes:

Required?

Data Type

Example Value

Yes

Column reference or any value

myColumn

values_array

Array of values to search for the first parameter.

  • Column references are not supported.

Usage Notes:

Required?

Data Type

Example Value

Yes

Array literal

'Steve'

Examples

Astuce

For additional examples, see Common Tasks.

Example - Replace T-shirt color

Source:

You have the following source data on your products:

ProdId

ProductName

Color

Brand

P001

T-shirt

white

discount

P002

pants

beige

discount

P003

hat

black

discount

P004

T-shirt

white

mid

P005

pants

black

mid

P006

hat

red

mid

P007

T-shirt

white

high-end

P008

pants

white

high-end

P009

hat

blue

high-end

In the data, you notice an error. For the discount and mid brands, T-shirt color should be orange. You need to fix it.

Transformation:

In the Transformer page, you select the white value from the histogram at the top of the Color column. Among the suggestion cards, select the Set transform. For the first variant, all values are missing. Click Modify. The current transform is the following:

Transformation Name

Edit column with formula

Parameter: Columns

Color

Parameter: Formula

null()

Parameter: Group rows by

Color == 'white'

In the Preview, you can see that this transform matches all white values in the column and replaces them with a null value. Since the replacement value is orange, you can edit the transform so it looks like the following:

Transformation Name

Edit column with formula

Parameter: Columns

Color

Parameter: Formula

'orange'

Parameter: Group rows by

Color == 'white'

This step looks better. However, it is replacing all instances of white, including those for white pants (P008) and high-end T-shirts (p007), which should not be replaced. To fix, you must add conditions to the row expression. First, add the following, which ensures that the transform only replaces for T-shirts:

Transformation Name

Edit column with formula

Parameter: Columns

Color

Parameter: Formula

'orange'

Parameter: Group rows by

(Color == 'white' && ProductName == 'T-shirt')

Now, the Preview shows that only T-shirt values are being changed. The transform needs to be further modified to restrict only to the appropriate brands (discount and mid):

Transformation Name

Edit column with formula

Parameter: Columns

Color

Parameter: Formula

'orange'

Parameter: Group rows by

(Color == 'white' && ProductName == 'T-shirt' && IN(Brand, ["discount","mid"]))

Note

It's possible to specify the brand restriction as (Brand <> 'high-end'). However, if there are other brand values in the full dataset, this restriction fails.

Results:

ProdId

ProductName

Color

Brand

P001

T-shirt

orange

discount

P002

pants

beige

discount

P003

hat

black

discount

P004

T-shirt

orange

mid

P005

pants

black

mid

P006

hat

red

mid

P007

T-shirt

white

high-end

P008

pants

white

high-end

P009

hat

blue

high-end