Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

Info

NOTE: If you are unsure of the meaning of a column of data that contains missing values, you should attempt to review the source data or contact the individual who generated the data to identify why values may be missing and how to effectively manage them in

D s product
rtrue
and downstream systems.

Locate missing values

...

  • missing value is any value that either contains no content or is non-existent. 
    • An example of a non-existent value is a cell in a column of integers that has no value in it. In this special case, the missing value is called a null value
    • Null values are converted to missing values during import. For more information, see Manage Null Values.
  • Values that are spaces (one or more presses of the SPACEBAR) or tabs (one or more presses of the TAB key) are not missing values.

    Tip

    Tip: To trim whitespace out of a column, use the following transformtransformation:

    D trans
    Typestep
    p01NameColumns
    p01Valuecolumn1
    p02NameFormula
    p02ValueTRIM(column1)
    SearchTermEdit column with formula

    This step may increase the number of missing values (for values that contain only whitespace characters) and the number of instances of matching values (for values that have spaces before and after an alphanumeric value).

  • Return (\n) and newline (\l)  are considered missing.

In the data quality bar, missing values are identified in blackgray:

D caption
typefigure
Missing values in blackgray
Tip

Tip: From the Transformer page, click the missing values in a column to see their count, highlight them in the rows of the data grid, and trigger a set of suggestions for your review.

...

Tip

Tip: When cleaning up missing data, you should look to work from bigger problems to smaller problems. If a higher percentage of a column's values have been categorized as missing data, you should look across affected rows to see if it's a wider problem. If other records look ok, you should consider dropping deleting the column or figuring out how to manage the missing values, including populating them.

...

  1. Identify if the column values are required. 

    1. Check the target system to determine if the field must have a value. If values are not required, don't worry about it. Consider dropping the deleting the column.

    2. Remember that null values imported into

      D s product
       are exported as missing values, which are easier to consume in most systems. 

    3. Check the column header and data type to determine if values are required. For example, in transactional data, a field called coupon_code requires data only if every transaction is processed with one. 

    4. If it's available, check the source system to see if it requires entry into the field. If an entry is required and your data contains missing values, then there is an issue in how the data was exported from the source system.
  2. Insert a constant value. You can replace a missing value with a constant, which may make it easier to locate more important issues in the application. 

  3. Use a function. Particularly if the missing data can be computed, you can use one of the available functions to populate the missing values. 

  4. Copy values from another column. If a value from another column or a modified form of it can be used for the missing value, you can use the set transform to overwrite the missing values.

  5. Delete rows. Select the missing values bar and use the delete transform to remove the problematic rows. 

    Info

    NOTE: Since missing data may not be an explicit problem, you should avoid deleting rows or the column itself until other options have been reviewed.

  6. Hide the column for now. You can remove the column from display if you want to focus on other things. Select Hide from the column drop-down. Note that hidden columns still appear in any generated output. 
  7. Drop Delete the column. If the column data is unnecessary or otherwise unusable, you can drop delete the entire column from your dataset. Select DropDelete from the column drop-down.

    Tip

    Tip: Drop Delete unnecessary columns as early as possible. Less data is easier to work with in the application and increases job execution performance.

...

Info

NOTE: Generally speaking, inserting constants in place of missing values is not a recommended practice, especially if downstream consuming applications and individuals may not be known. In particular, you should not replace missing numeric values with a fixed numeric value, which will skew analysis. Use this method only if your entire data chain is aware of the constants.

Steps:

  1. Click the black gray missing values segment of the data quality bar for the column to fix.

    Tip

    Tip: Select a missing value in the data grid. Then, select the replace suggestion and then modify it to include the replacement value.

  2. In the suggestion cards, click the set suggestion. 
  3. By default, this transform sets the missing value to be a null value. Click Edit.
  4. You might seem something like the following:

    D trans
    Typestep
    p01NameColumns
    p01Valuecountry
    p02NameFormula
    p02ValueIF(ISMISSING([country]),NULL(),country)
    SearchTermEdit column with formula

  5. The missing data is identified using the row:ISMISSING reference. To apply a constant, replace the NULL() reference with a constant value, as in the following:

    D trans
    Typestep
    p01NameColumns
    p01Valuecountry
    p02NameFormula
    p02ValueIF(ISMISSING([country]),'USA',country)
    SearchTermEdit column with formula

    Note that the single quotes around the value are required, since it identifies the value as a constant.

  6. Click Add.

...