Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r093

D toc

Excerpt

When data is imported from another system, you might discover that some values are missing in it. In some cases, these values simply contain no content. In other cases, these values are non-existent. Depending on how the missing values entered the data, you may end up processing them in different ways. This section describes how to identify and manage missing data in your datasets.

Info

NOTE: If you are unsure of the meaning of a column of data that contains missing values, you should attempt to review the source data or contact the individual who generated the data to identify why values may be missing and how to effectively manage them in

D s product
rtrue
and downstream systems.

...

  • missing value is any value that either contains no content or is non-existent. 
    • An example of a non-existent value is a cell in a column of integers that has no value in it. In this special case, the missing value is called a null value
    • Null values are converted to missing values during import. For more information, see Manage Null Values. 
  • Values that are spaces (one or more presses of the SPACEBAR) or tabs (one or more presses of the TAB key) are not missing values.

    Tip

    Tip: To trim whitespace out of a column, use the following transformation:

    D trans
    Typestep
    p01NameColumns
    p01Valuecolumn1
    p02NameFormula
    p02ValueTRIM(column1)
    SearchTermEdit column with formula

    This step may increase the number of missing values (for values that contain only whitespace characters) and the number of instances of matching values (for values that have spaces before and after an alphanumeric value).

  • Return (\n) and newline (\l)  are considered missing.

In the data quality bar, missing values are identified in blackgray:

D caption
typefigure
Missing values in blackgray
Tip

Tip: From the Transformer page, click the missing values in a column to see their count, highlight them in the rows of the data grid, and trigger a set of suggestions for your review.

...

Info

NOTE: Generally speaking, inserting constants in place of missing values is not a recommended practice, especially if downstream consuming applications and individuals may not be known. In particular, you should not replace missing numeric values with a fixed numeric value, which will skew analysis. Use this method only if your entire data chain is aware of the constants.

Steps:

  1. Click the black gray missing values segment of the data quality bar for the column to fix.

    Tip

    Tip: Select a missing value in the data grid. Then, select the replace suggestion and then modify it to include the replacement value.

  2. In the suggestion cards, click the set suggestion. 
  3. By default, this transform sets the missing value to be a null value. Click Edit.
  4. You might seem something like the following:

    D trans
    Typestep
    p01NameColumns
    p01Valuecountry
    p02NameFormula
    p02ValueIF(ISMISSING([country]),NULL(),country)
    SearchTermEdit column with formula

  5. The missing data is identified using the row:ISMISSING reference. To apply a constant, replace the NULL() reference with a constant value, as in the following:

    D trans
    Typestep
    p01NameColumns
    p01Valuecountry
    p02NameFormula
    p02ValueIF(ISMISSING([country]),'USA',country)
    SearchTermEdit column with formula

    Note that the single quotes around the value are required, since it identifies the value as a constant.

  6. Click Add.

Tip

Tip: You can also use the IFMISSING function to test for empty values. Unlike the above construction, however, IFMISSING does not support an else clause when the value is present. For more information, see IFMISSING Function.

...

Copy values from another column

...

Info

NOTE: If missing metadata is not supported as part of the value in the target system, you can insert the metadata as a separate column and then apply the metadata to the data inside the target system.

...

D s also
inCQLtrue
label((label = "data_type") OR (label = "missing") OR (label = "transformation_ui"))