Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

D toc

Datetime values can be imported into 

D s product
rtrue
 in a variety of formats. Below are just a few examples of one date in different acceptable formats:

myDate
Mar-14-2018
03/14/2018
2018-Mar-03
3/14/18
03/14/2018 00:00:00
March 14, 2018


This section describes the tools and approaches for standardizing and formatting your date values.

Recommended Approaches

When you are formatting a column of date values, you can attempt to standardize the values in the following order. 

Option 1 - Patterns in the Column Details panel

Through the Column Details panel, you can review the set of patterns that match the values in your date column and select the ones to apply to standardize the values.

Steps:

  1. From the column menu for your date column, select Column Details.
  2. In the Column Details panel, click the Patterns tab.
  3. In the Patterns tab, you can review the set of patterns that describe all values that appear in the column. Select one that needs to be corrected. 
  4. In the right panel, select the Convert card.

    Tip

    Tip: If you do not see the Convert card, you might try to generate a new random sample, in which example patterns are more evenly distributed throughout the sample.


    D caption
    typefigure
    Select the Convert card in the Patterns tab
  5. Click Add.
  6. The number of patterns displayed in the Patterns tab is reduced. You can continue to select patterns to standardize values. 
  7. Iterate until there is only one pattern displayed in the panel. For more information on Datetime patterns, see Standardize Using Patterns.

Option 2 - Patterns based on date format

In some cases, you may not be able to simply select patterns, which generates sufficient suggestions to standardize your date values. A second approach involves keying on mismatched values in the column.

Tip

Tip: This technique works for columns in which all values are valid Datetime values but are in different date formats. If you have values that are invalid for any date format, you must use Option 3 to correct the syntax errors using patterns first. See below.

In this case, you set the data type for the column to Datetime and use the DATEFORMAT function to match the format of the values that you want to change. Next to the values from the preceding table, you can see the corresponding date format token:

myDateDATEFORMAT value
Mar-14-2018MMM-dd-yyyy
03/14/2018MM/dd/yyyy
2018-Mar-03yyyy-MMM-dd
3/14/18M/d/yy
03/14/2018 00:00:00MM/dd/yyyy HH:mm:ss
March 14, 2018MMMM dd, yyyy

For purposes of this example, suppose your myDate column contains values in MM/dd/yyyy and M/d/yy format. You wish to standardize on MMMM dd, yyyy format.

Steps:

  1. From the Data Type menu at the top of the myDate column, select Date/Time
  2. In the dialog, select the Date format that matches values you wish to fix:

    D caption
    typefigure
    Date/Time format selector

    For more information on the supported Datetime formats, see Datetime Data Type.

  3. Click Save.
  4. Now, you need to modify the values that match this format to match the target format (MMMM dd, yyyy). Click the green bar in the column, which matches the values for the currently valid Datetime format., Then click the Set suggestion. Click Modify.

  5. In the Transform Builder, you have a predefined transformation that sets values based on whether the column values are valid for the currently specified data type and format. You must replace the NULL() entry with the DATEFORMAT function which changes these values to the proper format:

    D trans
    Typestep
    p01NameColumns
    p01ValuemyDate
    p02NameFormula
    p02Valueifvalid($col, ['Datetime','yy','yyyy'], dateformat($col, 'MMMM dd, yyyy'))
    SearchTermEdit with formula

  6. Click Add. All values that matched the MM/dd/yyyy format are converted to the MMMM dd, yyyy format.

  7. Repeat the previous steps:
    1. Set the column's Datetime format to: M/d/yyyy.
    2. Select the green bar in the column data quality bar. 
    3. Select the Set suggestion and modify it. 
    4. For the value in the transformation, insert the following function:

      Code Block
      ifvalid($col, ['Datetime','M/d/yyyy'], dateformat(myDate, 'MMMM dd, yyyy'))
    5. Add the transformation to you recipe.
  8. Repeat Step 7 for any other mismatched formats.
  9. You may have some manual fixups to complete at the end. See below.

Option 3 - Transformation by Example

You can reformat dates by providing example output values for a listed source value. For a column of date values, you can begin providing example outputs for individual values, and 

D s product
 can perform pattern-based transformations to similarly formatted values. For more information, see Overview of TBE.

Option 4 - Manual fixups

Steps:

  1. Now that you have selected a specific format for your Datetime values, the rows that do not match this format are now identified as mismatched in the column. Click the red bar at the top of the column.
  2. In the Status bar at the bottom of the screen, click Show only affected rows.
  3. You can now see only the rows that remain mismatched with respect to the preferred Datetime format. 
  4. Select one of these values. For example, suppose you have quite a few values that are only four-digit year values (YYYY). Select one of the values. Then, select the Replace card. Click Edit.
  5. Your transformation should look like the following:

    D trans
    p03Value''
    Typestep
    p01NameColumn
    p01ValueUpdateTime
    p02NameFind
    p02Value`{start}{digit}{4}{end}`
    p03NameReplace with
    SearchTermReplace text or patterns

     

  6. You can modify the search and replace patterns to capture and write back the year value:
    1. In the Find value, put parentheses around the pattern that captures the four digits in a row. Adding parentheses around a matching pattern identifies that sub-pattern as a capture group, which can be referenced in any replacement.
    2. The capture group should look like the following:

      Code Block
      ({digit}{4})
    3. For the Replace with value, you must insert a month and day value according to the format selected for the column (MM/DD/YYYY), followed by a reference back to the capture group.

    4. Capture groups from the matching pattern can be referenced in the replacement value using references such as $1$2$3, and so on. These tokens refer to the first, second, and third capture groups in the Find value.
    5.  The Replace value should look like the following:

      Code Block
      01/01/$1
    6. Your transformation should look like the following when done:

      D trans
      p03Value01/01/$1
      Typestep
      p01NameColumn
      p01ValueUpdateTime
      p02NameFind
      p02Value`{start}({digit}{4}){end}`
      p03NameReplace with
      SearchTermReplace text or patterns

  7. Click Add.
  8. You can repeat these steps for the remaining mismatched values.

Custom Datetime Formats

You can create your own customized Datetime formats using the DATEFORMAT function. For example, the following changes the format of the lastDate function to use the yyyy:MM:dd format:

D trans
Typestep
p01NameColumns
p01ValuelastDate
p02NameFormula
p02ValueDATEFORMAT(lastDate, 'yyyy:MM:dd')
SearchTermEdit with formula

For more information on the supported codes for specifying your own Datetime formats, see Datetime Data Type.

Normalize Regional Differences

The following date values correspond to the same date but vary in format in different regions of the world:

Date ValueRegion
03/14/2018U.S.
14/03/2018E.U.
2014-03-14China

In the above examples, the delimiters for the U.S. and E.U. values are identical, which makes parsing these values more challenging. 

Tip

Tip: If your dataset contains date values from different regions of the world, you should find or create a separate column to identify the applicable region.

Suppose the previous set of dates was represented in your dataset with the following values:

contractDateregion
03/14/2018USA
14/03/2018EU
2014-03-14CHN

In this case, you might try the following generalized solution. You can use conditional transformations to extract the day, month, and year values from the contractDate column based on the value in the region column.

Info

NOTE: This solution assumes that all date values within for a specific region (e.g. USA) are consistently formatted. You should perform those formatting actions first.

Steps:

  1. First, you must split the column based on the cell value's delimiter. Note that the following transformation uses the 

    D s item
    itempattern
     {delim} to locate the delimiter in the cell value. This delimiter is either a dash or a slash.

    D trans
    p03Value`{delim}`
    Typestep
    p01NameColumn
    p01ValuecontractDate
    p02NameOption
    p02Valueby Delimiter
    p03NameDelimiter
    SearchTermSplit by delimiter

  2. Create the following three conditional transformations for extracting the day, month, or year values based on the value in the Region column. Here is the transformation to acquire the year values:

    D trans
    p03Value'EU'
    p06NameValue 2
    p01NameCondition type
    p06ValuecontractDate3
    p03NameCase 1
    p07Value'CHN'
    p04ValuecontractDate3
    SearchTermconditions
    p07NameCase 3
    Typestep
    p05NameCase 2
    p01ValueCase on single column
    p02NameColumn to evaluate
    p02ValueRegion
    p05Value'USA'
    p04NameValue 1
    p08ValuecontractDate1
    p08NameValue 1

  3. For month:

    D trans
    p03Value'EU'
    p06NameValue 2
    p01NameCondition type
    p06ValuecontractDate1
    p03NameCase 1
    p07Value'CHN'
    p04ValuecontractDate2
    SearchTermconditions
    p07NameCase 3
    Typestep
    p05NameCase 2
    p01ValueCase on single column
    p02NameColumn to evaluate
    p02ValueRegion
    p05Value'USA'
    p04NameValue 1
    p08ValuecontractDate2
    p08NameValue 1

     

  4. For day:

    D trans
    p03Value'EU'
    p06NameValue 2
    p01NameCondition type
    p06ValuecontractDate2
    p03NameCase 1
    p07Value'CHN'
    p04ValuecontractDate1
    SearchTermconditions
    p07NameCase 3
    Typestep
    p05NameCase 2
    p01ValueCase on single column
    p02NameColumn to evaluate
    p02ValueRegion
    p05Value'USA'
    p04NameValue 1
    p08ValuecontractDate3
    p08NameValue 1

  5. You can now bring together these three columns:

    D trans
    p03ValuenewDate
    Typestep
    p01NameColumns
    p01Valueday, month, year
    p02NameSeparator
    p02Value'/'
    p03NameNew column name
    SearchTermMerge columns

  6. You now have your new date column. You may need to reformat it into a preferred format.
  7. Delete the columns that were created during this process.