- A missing value is any value that either contains no content or is non-existent.
- An example of a non-existent value is a cell in a column of integers that has no value in it. In this special case, the missing value is called a null value.
- Null values are converted to missing values during import. For more information, see Manage Null Values.
Values that are spaces (one or more presses of the
SPACEBAR) or tabs (one or more presses of the
TABkey) are not missing values.
Tip: To trim whitespace out of a column, use the following transform:
d- set col:column1 value: showNote true trans Type step p01Name Columns p01Value column1 p02Name Formula p02Value TRIM(column1) SearchTerm Edit column with formula
This step may increase the number of missing values (for values that contain only whitespace characters) and the number of instances of matching values (for values that have spaces before and after an alphanumeric value).
- Return (
\n) and newline (
\l) are considered missing.
Click the black missing values segment of the data quality bar for the column to fix.
Tip: Select a missing value in the data grid. Then, select the
replacesuggestion and then modify it to include the replacement value.
- In the suggestion cards, click the
- By default, this transform sets the missing value to be a null value. Click Edit.
You might seem something like the following:codeset col: country value: NULL() row:
d- trans Type step p01Name Columns p01Value country p02Name Formula p02Value IF(ISMISSING([country]),NULL(),country) SearchTerm Edit column with formula
The missing data is identified using thecodeset col: country value: 'USA' row:
row:ISMISSINGreference. To apply a constant, replace the
NULL()reference with a constant value, as in the following:
d- trans Type step p01Name Columns p01Value country p02Name Formula p02Value IF(ISMISSING([country]) ,'USA',country) SearchTerm Edit column with formula
Note that the single quotes around the value are required, since it identifies the value as a constant.
You can populate missing values with values from another column. In the following example, the
nickname column is populated with the value of
first_name if it is missing:
trans Type step p01Name Columns p01Value nickname p02Name Formula p02Value IF(ISMISSING([nickname]),first_name,nickname) SearchTerm Edit column with formula
Use functions to populate missing values
Tip: Be careful using functions such as averages to compute missing values. These computations may factor outliers that have not yet been removed or may fail to account for local trends relative to the data. Study the values and their meaning in the column before performing replacements. When in doubt, a median value may be your best best, assuming outliers and spurious data have been properly addressed.
Manage Missing Metadata
In some cases, a column may contain valid values, but the meaning of those values is missing from the data. For example, your data contains the following Timestamp information:
The easiest way to handle the insertion of year information is to split out the timestamp data into separate components and then to merge back the content together with the inserted year information. Since the above timestamp data essentially contains three separate fields (Day of Month, Month, and Time), you can use a split command to break this information into three separate columns. Highlight one of the spaces between Day of Month and Month and select the
split suggestion. The Wrangle step should look similar to the following:
split col: column1 on: ' ' limit: 2
Now, your data should be stored in three separate columns.
You may notice that the data has been formatted without spaces (
19May02:45:38), and there is no year information yet. Click Edit.
You need to modify the list of columns to insert spaces and the year identifier back into the data. It should look similar to the following:
You can create new columns containing a year value (
myYear) then merge the columns together:
After you have inserted the year information and merged the columns, you should be able to change the column data type to the appropriate version of Date/Time.