Manage String Lengths
In this example, your target system has a limit on the maximum length for the First Name and Last Name fields. You can use the following transforms to evaluate and truncate your strings based on their length.
Test String Length
You can use the following command to write a TOO LONG
message when the length of the first_name
field exceeds 32 characters:
Transformation Name | |
---|---|
Parameter: Columns | String_test |
Parameter: Formula | IF(LEN(first_name) > 32, 'TOO LONG',String_test) |
Truncate Strings
The above test allows you to evaluate individual strings that are too long to see if they are errors or can somehow be shortened. For a large dataset in which you cannot easily solve these problems, you can simply choose to cut off the length of a string at 32 characters:
Transformation Name | |
---|---|
Parameter: Columns | * |
Parameter: Formula | LEFT($col,32) |
In the above, you can use a wildcard to match all columns in the dataset. The replacement value is defined to be the first 32 characters of the source column ($col
). By definition of the LEFT
function, columns that are shorter than 32 characters in length are untouched.
Astuce
If the field you are truncating is used as a key to your dataset, you should verify that your key still contains unique values after you have applied the truncation. For example, if the combination of first_name
and last_name
is a unique identifier in your dataset, you should verify that the column containing these identifiers contains unique values.
Specialized String Lengths
In some cases, you might want to limit the lengths of text strings. In this example, your dataset contains a column of zip code values, some of which are in Zip+4 format. Your source data might look like the following:
zip_code |
---|
94104 |
94104-2218 |
94105 |
For consistency, you might want to limit the column to use just the first five digits of the zip code.
Steps:
Select the first five digits of one of the nine-digit zip codes.
In the suggestion cards, select the Extract card.
Select the following variation:
Transformation Name
Extract text or pattern
Parameter: Column to extract from
zipcode
Parameter: Option
Custom text or pattern
Parameter: Text to extract
`{zip}`
Parameter: Start extracting after
`{start}`
Click Add.
The above solution references two Wrangle to identify elements of the cell value. For more information, see Text Matching.
For a more generalized approach, you can use some of the following string functions to limit your data length. Values that are shorter than the designated string length are left untouched.
Note
Transforms that cut down the size of a value might generate mismatched or missing values based on the column's data type. You should verify that you are not creating new missing or mismatched values.
Use Rightmost Values
Use the following transform to reduce a string to the rightmost 6 characters in any value:
Transformation Name | |
---|---|
Parameter: Columns | prodID |
Parameter: Formula | RIGHT(prodID, 6) |
Substring Values
The SUBSTRING
function enables you to designate a specific subset of the string's characters to use. You specify the index of the first character in the values and the number of subsequent characters to include. For example, when applied to the value United States of America
in the countries
column, the following transform sets the new value to be States
.
Transformation Name | |
---|---|
Parameter: Columns | countries |
Parameter: Formula | SUBSTRING(countries, 7, 6) |
Note that the index value begins at zero; to extract from the beginning of the value, replace 7
above with 0
.
Additional String Functions
Wrangle supports other functions, which can be used to transform string values. See String Functions.