Page tree

 

Support | BlogContact Us | 844.332.2821

 

Contents:

The cloud-based version of Trifacta Wrangler is now available! Read all about it, and register for your free account.

Contents:


Datetime values can come in a variety of formats. Below are just a few examples of one date in different acceptable formats:

myDate
Mar-14-2018
03/14/2018
2018-Mar-03
3/14/18
03/14/2018 00:00:00
March 14, 2018


This section describes the tools and approaches for standardizing and formatting your date values.

Recommended Approaches

When you are formatting a column of date values, you can attempt to standardize the values in the following order. 

Option 1 - Patterns in the Column Details panel

Through the Column Details panel, you can review the set of patterns that match the values in your date column and select the ones to apply to standardize the values.

Steps:

  1. From the column menu for your date column, select Column Details.
  2. In the Column Details panel, click the Patterns tab.
  3. In the Patterns tab, you can review the set of patterns that describe all values that appear in the column. Select one that needs to be corrected. 
  4. In the right panel, select the Convert card.

    Tip: If you do not see the Convert card, you might try to generate a new random sample, in which example patterns are more evenly distributed throughout the sample.


    Figure: Select the Convert card in the Patterns tab

  5. Click Add.
  6. The number of patterns displayed in the Patterns tab is reduced. You can continue to select patterns to standardize values. 
  7. Iterate until there is only one pattern displayed in the panel. For more information on Datetime patterns, see Standardize Using Patterns.

Option 2 - Patterns based on date format

In some cases, you may not be able to simply select patterns, which generates sufficient suggestions to standardize your date values. A second approach involves keying on mismatched values in the column.

Tip: This technique works for columns in which all values are valid Datetime values but are in different date formats. If you have values that are invalid for any date format, you must use Option 3 to correct the syntax errors using patterns first. See below.

In this case, you set the data type for the column to Datetime and use the DATEFORMAT function to match the format of the values that you want to change. Next to the values from the preceding table, you can see the corresponding date format token:

myDateDATEFORMAT value
Mar-14-2018MMM-dd-yyyy
03/14/2018MM/dd/yyyy
2018-Mar-03yyyy-MMM-dd
3/14/18M/d/yy
03/14/2018 00:00:00MM/dd/yyyy HH:mm:ss
March 14, 2018MMMM dd, yyyy

For purposes of this example, suppose your myDate column contains values in MM/dd/yyyy and M/d/yy format. You wish to standardize on MMMM dd, yyyy format.

Steps:

  1. From the Data Type menu at the top of the myDate column, select Date/Time
  2. In the dialog, select the Date format that matches values you wish to fix:

    Figure: Date/Time format selector

    For more information on the supported Datetime formats, see Datetime Data Type.

  3. Click Save.
  4. Now, you need to modify the values that match this format to match the target format (MMMM dd, yyyy). Click the green bar in the column, which matches the values for the currently valid Datetime format., Then click the Set suggestion. Click Modify.

  5. In the Transform Builder, you have a predefined transformation that sets values based on whether the column values are valid for the currently specified data type and format. You must replace the NULL() entry with the DATEFORMAT function which changes these values to the proper format:

    Transformation Name Edit with formula
    Parameter: Columns myDate
    Parameter: Formula ifvalid($col, ['Datetime','yy','yyyy'], dateformat($col, 'MMMM dd, yyyy'))

  6. Click Add. All values that matched the MM/dd/yyyy format are converted to the MMMM dd, yyyy format.

  7. Repeat the previous steps:
    1. Set the column's Datetime format to: M/d/yyyy.
    2. Select the green bar in the column data quality bar. 
    3. Select the Set suggestion and modify it. 
    4. For the value in the transformation, insert the following function:

      ifvalid($col, ['Datetime','M/d/yyyy'], dateformat(myDate, 'MMMM dd, yyyy'))
    5. Add the transformation to you recipe.
  8. Repeat Step 7 for any other mismatched formats.
  9. You may have some manual fixups to complete at the end. See below.

Option 3 - Manual fixups

Steps:

  1. Now that you have selected a specific format for your Datetime values, the rows that do not match this format are now identified as mismatched in the column. Click the red bar at the top of the column.
  2. In the Status bar at the bottom of the screen, click Show only affected rows.
  3. You can now see only the rows that remain mismatched with respect to the preferred Datetime format. 
  4. Select one of these values. For example, suppose you have quite a few values that are only four-digit year values (YYYY). Select one of the values. Then, select the Replace card. Click Edit.
  5. Your transformation should look like the following:

    Transformation Name Replace text or patterns
    Parameter: Column UpdateTime
    Parameter: Find `{start}{digit}{4}{end}`
    Parameter: Replace with ''

     

  6. You can modify the search and replace patterns to capture and write back the year value:
    1. In the Find value, put parentheses around the pattern that captures the four digits in a row. Adding parentheses around a matching pattern identifies that sub-pattern as a capture group, which can be referenced in any replacement.
    2. The capture group should look like the following:

      ({digit}{4})
    3. For the Replace with value, you must insert a month and day value according to the format selected for the column (MM/DD/YYYY), followed by a reference back to the capture group.

    4. Capture groups from the matching pattern can be referenced in the replacement value using references such as $1$2$3, and so on. These tokens refer to the first, second, and third capture groups in the Find value.
    5.  The Replace value should look like the following:

      01/01/$1
    6. Your transformation should look like the following when done:

      Transformation Name Replace text or patterns
      Parameter: Column UpdateTime
      Parameter: Find `{start}({digit}{4}){end}`
      Parameter: Replace with 01/01/$1

  7. Click Add.
  8. You can repeat these steps for the remaining mismatched values.

Custom Datetime Formats

You can create your own customized Datetime formats using the DATEFORMAT function. For example, the following changes the format of the lastDate function to use the yyyy:MM:dd format:

Transformation Name Edit with formula
Parameter: Columns lastDate
Parameter: Formula DATEFORMAT(lastDate, 'yyyy:MM:dd')

For more information on the supported codes for specifying your own Datetime formats, see Datetime Data Type.

Normalize Regional Differences

The following date values correspond to the same date but vary in format in different regions of the world:

Date ValueRegion
03/14/2018U.S.
14/03/2018E.U.
2014-03-14China

In the above examples, the delimiters for the U.S. and E.U. values are identical, which makes parsing these values more challenging. 

Tip: If your dataset contains date values from different regions of the world, you should find or create a separate column to identify the applicable region.

Suppose the previous set of dates was represented in your dataset with the following values:

contractDateregion
03/14/2018USA
14/03/2018EU
2014-03-14CHN

In this case, you might try the following generalized solution. You can use conditional transformations to extract the day, month, and year values from the contractDate column based on the value in the region column.

NOTE: This solution assumes that all date values within for a specific region (e.g. USA) are consistently formatted. You should perform those formatting actions first.

Steps:

  1. First, you must split the column based on the cell value's delimiter. Note that the following transformation uses the Trifacta pattern {delim} to locate the delimiter in the cell value. This delimiter is either a dash or a slash.

    Transformation Name Split by delimiter
    Parameter: Column contractDate
    Parameter: Option by Delimiter
    Parameter: Delimiter `{delim}`

  2. Create the following three conditional transformations for extracting the day, month, or year values based on the value in the Region column. Here is the transformation to acquire the year values:

    Transformation Name conditions
    Parameter: Condition type Case on single column
    Parameter: Column to evaluate Region
    Parameter: Case 1 'EU'
    Parameter: Value 1 contractDate3
    Parameter: Case 2 'USA'
    Parameter: Value 2 contractDate3
    Parameter: Case 3 'CHN'
    Parameter: Value 1 contractDate1

  3. For month:

    Transformation Name conditions
    Parameter: Condition type Case on single column
    Parameter: Column to evaluate Region
    Parameter: Case 1 'EU'
    Parameter: Value 1 contractDate2
    Parameter: Case 2 'USA'
    Parameter: Value 2 contractDate1
    Parameter: Case 3 'CHN'
    Parameter: Value 1 contractDate2

     

  4. For day:

    Transformation Name conditions
    Parameter: Condition type Case on single column
    Parameter: Column to evaluate Region
    Parameter: Case 1 'EU'
    Parameter: Value 1 contractDate1
    Parameter: Case 2 'USA'
    Parameter: Value 2 contractDate2
    Parameter: Case 3 'CHN'
    Parameter: Value 1 contractDate3

  5. You can now bring together these three columns:

    Transformation Name Merge columns
    Parameter: Columns day, month, year
    Parameter: Separator '/'
    Parameter: New column name newDate

  6. You now have your new date column. You may need to reformat it into a preferred format.
  7. Drop the columns that were created during this process.

Your Rating: Results: PatheticBadOKGoodOutstanding! 2 rates

This page has no comments.