When data is imported from another system, you might discover that some values are missing in it. In some cases, these values simply contain no content. In other cases, these values are non-existent. Depending on how the missing values entered the data, you may end up processing them in different ways. This section describes how to identify and manage missing data in your datasets.
NOTE: If you are unsure of the meaning of a column of data that contains missing values, you should attempt to review the source data or contact the individual who generated the data to identify why values may be missing and how to effectively manage them in
- A missing value is any value that either contains no content or is non-existent.
- An example of a non-existent value is a cell in a column of integers that has no value in it. In this special case, the missing value is called a null value.
- Null values are converted to missing values during import. For more information, see Manage Null Values.
Values that are spaces (one or more presses of the
SPACEBAR) or tabs (one or more presses of the
TABkey) are not missing values.
Tip: To trim whitespace out of a column, use the following transformation:
D trans Type step p01Name Columns p01Value column1 p02Name Formula p02Value TRIM(column1) SearchTerm Edit column with formula
This step may increase the number of missing values (for values that contain only whitespace characters) and the number of instances of matching values (for values that have spaces before and after an alphanumeric value).
- Return (
\n) and newline (
\l) are considered missing.
In the data quality bar, missing values are identified in blackgray:
Missing values in blackgray
Tip: From the Transformer page, click the missing values in a column to see their count, highlight them in the rows of the data grid, and trigger a set of suggestions for your review.
NOTE: Generally speaking, inserting constants in place of missing values is not a recommended practice, especially if downstream consuming applications and individuals may not be known. In particular, you should not replace missing numeric values with a fixed numeric value, which will skew analysis. Use this method only if your entire data chain is aware of the constants.
Click the black gray missing values segment of the data quality bar for the column to fix.
Tip: Select a missing value in the data grid. Then, select the
replacesuggestion and then modify it to include the replacement value.
- In the suggestion cards, click the
- By default, this transform sets the missing value to be a null value. Click Edit.
You might seem something like the following:
D trans Type step p01Name Columns p01Value country p02Name Formula p02Value IF(ISMISSING([country]),NULL(),country) SearchTerm Edit column with formula
The missing data is identified using the
row:ISMISSINGreference. To apply a constant, replace the
NULL()reference with a constant value, as in the following:
D trans Type step p01Name Columns p01Value country p02Name Formula p02Value IF(ISMISSING([country]),'USA',country) SearchTerm Edit column with formula
Note that the single quotes around the value are required, since it identifies the value as a constant.
Tip: You can also use the
Copy values from another column
NOTE: If missing metadata is not supported as part of the value in the target system, you can insert the metadata as a separate column and then apply the metadata to the data inside the target system.
|D s also|