...
Info | ||||
---|---|---|---|---|
NOTE: If you are unsure of the meaning of a column of data that contains missing values, you should attempt to review the source data or contact the individual who generated the data to identify why values may be missing and how to effectively manage them in
|
Locate missing values
...
- A missing value is any value that either contains no content or is non-existent.
- An example of a non-existent value is a cell in a column of integers that has no value in it. In this special case, the missing value is called a null value.
- Null values are converted to missing values during import. For more information, see Manage Null Values.
Values that are spaces (one or more presses of the
SPACEBAR
) or tabs (one or more presses of theTAB
key) are not missing values.Tip Tip: To trim whitespace out of a column, use the following transformtransformation:
D trans Type step p01Name Columns p01Value column1 p02Name Formula p02Value TRIM(column1) SearchTerm Edit column with formula This step may increase the number of missing values (for values that contain only whitespace characters) and the number of instances of matching values (for values that have spaces before and after an alphanumeric value).
- Return (
\n
) and newline (\l
) are considered missing.
In the data quality bar, missing values are identified in blackgray:
D caption | ||
---|---|---|
| ||
Missing values in blackgray |
Tip |
---|
Tip: From the Transformer page, click the missing values in a column to see their count, highlight them in the rows of the data grid, and trigger a set of suggestions for your review. |
...
Tip |
---|
Tip: When cleaning up missing data, you should look to work from bigger problems to smaller problems. If a higher percentage of a column's values have been categorized as missing data, you should look across affected rows to see if it's a wider problem. If other records look ok, you should consider dropping deleting the column or figuring out how to manage the missing values, including populating them. |
...
Identify if the column values are required.
Check the target system to determine if the field must have a value. If values are not required, don't worry about it. Consider dropping the deleting the column.
Remember that null values imported into
are exported as missing values, which are easier to consume in most systems.D s product Check the column header and data type to determine if values are required. For example, in transactional data, a field called
coupon_code
requires data only if every transaction is processed with one.- If it's available, check the source system to see if it requires entry into the field. If an entry is required and your data contains missing values, then there is an issue in how the data was exported from the source system.
Insert a constant value. You can replace a missing value with a constant, which may make it easier to locate more important issues in the application.
Use a function. Particularly if the missing data can be computed, you can use one of the available functions to populate the missing values.
Copy values from another column. If a value from another column or a modified form of it can be used for the missing value, you can use the
set
transform to overwrite the missing values.Delete rows. Select the missing values bar and use the
delete
transform to remove the problematic rows.Info NOTE: Since missing data may not be an explicit problem, you should avoid deleting rows or the column itself until other options have been reviewed.
- Hide the column for now. You can remove the column from display if you want to focus on other things. Select Hide from the column drop-down. Note that hidden columns still appear in any generated output.
Drop Delete the column. If the column data is unnecessary or otherwise unusable, you can drop delete the entire column from your dataset. Select DropDelete from the column drop-down.
Tip Tip: Drop Delete unnecessary columns as early as possible. Less data is easier to work with in the application and increases job execution performance.
...
Info |
---|
NOTE: Generally speaking, inserting constants in place of missing values is not a recommended practice, especially if downstream consuming applications and individuals may not be known. In particular, you should not replace missing numeric values with a fixed numeric value, which will skew analysis. Use this method only if your entire data chain is aware of the constants. |
Steps:
Click the black gray missing values segment of the data quality bar for the column to fix.
Tip Tip: Select a missing value in the data grid. Then, select the
replace
suggestion and then modify it to include the replacement value.- In the suggestion cards, click the
set
suggestion. - By default, this transform sets the missing value to be a null value. Click Edit.
You might seem something like the following:
D trans Type step p01Name Columns p01Value country p02Name Formula p02Value IF(ISMISSING([country]),NULL(),country) SearchTerm Edit column with formula
The missing data is identified using the
row:ISMISSING
reference. To apply a constant, replace theNULL()
reference with a constant value, as in the following:D trans Type step p01Name Columns p01Value country p02Name Formula p02Value IF(ISMISSING([country]),'USA',country) SearchTerm Edit column with formula Note that the single quotes around the value are required, since it identifies the value as a constant.
Click Add.
Tip |
---|
Tip: You can also use the |
Copy values from another column
...