Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0711

D toc

This section provides information on improvements to the

D s item
itemtype system
rtrue
.
Warning

If you have upgraded from a

D s item
Release
Release
3.0 or earlier to Release 3.1 or later, you should review this page, as some type-related behaviors have changed in the platform.

Release 7.1.1

PII - Improved matching for social security numbers

In prior releases, Personally Identifiable Information (PII) for social security numbers was identified based only on the length of values, which matched too broadly.

In this release, the constraints on matching of SSN values has been tightened when applied to PII. 

Tip

Tip: PII detection is applied in generated log entries and in collaborative suggestions. When matching PII patterns are detected in data that is surface in these two areas, a mask is applied over the values for security reasons.

For more information, see Social Security Number Data Type.

For more information, see Data Type Validation Patterns.

PII - Improved and expanded matching for credit card numbers

In prior releases, PII for credit card numbers was identified base on 16-digit values. 

In this release, the matching constraints have been expanded to include 14-digit credit card values. 

Also, the constraints around valid 16-digit numbers have been improved with better recognition around values for different types of credit cards. In the following table, you can see lists of valid test numbers for different credit card types and can see how detection of these values has changed between releases.

Test NumberCredit Card TypeIs Detected 7.4?Is detected 7.5?
2222 4053 4324 8877MastercardNoYes
2222 9909 0525 7051MastercardNoYes
2223 0076 4872 6984MastercardNoYes
2223 5771 2001 7656MastercardNoYes
5105 1051 0510 5100MastercardYesYes
5111 0100 3017 5156MastercardYesYes
5204 7400 0990 0014MastercardYesYes
5420 9238 7872 4339MastercardYesYes
5455 3307 6000 0018MastercardYesYes
5506 9004 9000 0444MastercardYesYes
5553 0422 4198 4105MastercardYesYes
5555 5555 5555 4444MastercardYesYes
4012 8888 8888 1881VisaYesYes
4111 1111 1111 1111VisaYesYes
6011 0009 9013 9424DiscoverYesYes
6011 1111 1111 1117DiscoverYesYes
3714 496353 98431American ExpressYesYes
3782 822463 10005American ExpressYesYes
3056 9309 0259 04DinersNoYes
3852 0000 0232 37DinersNoYes
3530 1113 3330 0000JCBYesYes
3566 0020 2036 0505JCBYesYes

For more information, see Credit Card Data Type.

For more information, see Data Type Validation Patterns.

Release 6.0 and earlier

Mismatched data types

Where there are mismatches between inputs and the expected input data type, the following values are generated for the mismatches:

Source data typeOutput if mismatched

Primitive data types:

  • Integer
  • Decimal
  • Boolean
  • Arrays
  • Maps

 

null value, if mismatched
Datetimenull value, if mismatched

Other non-primitive data types, including:

  • SSN
  • Phone Number
  • Email Address
  • Credit Card
  • Gender
  • IP Address
  • URL
  • HTTP Code
  • Zip Code

 

Converted to string values, if mismatched
StringAnything can be a String value.


State values and custom data types are converted to string values, if they are mismatched.


Three-value logic for null values

The 

D s photon
 running environment has been augmented to use three-value logic for null values.

When values are compared, the result can be true or false in most cases.

If a null value was compared to a null value in the

D s photon
running environment:

  • In Release 3.0 and earlier, this evaluated to true.
  • In Release 3.1 and later, this evaluates to an unknown (null) value.

This change aligns the behavior of the running environment with that of SQL and Hadoop Pig.


Improved handling of null values

Assume that the column nuller contains null values and that you have the following transform:

D code

derive value:(nuller >= 0)


Prior to Release 3.1, the above transform generated a column of true values.

In Release 3.1 and later, the transform generates a column of null values.


More consistent evaluation of null values in ternaries

In the following example, a_null_expression always evaluates to a null value.

D code

derive value: (a_null_expression ? 'a' : 'b')


In Release 3.0, this expression generated b for all inputs on the

D s photon
running environment and a null value on Hadoop Pig.

In Release 3.1 and later, this expression generates a null value for all inputs on both running environments.

Tip

Tip: Beginning in Release 3.1, you can use the if function instead of ternary expressions. Ternaries may be deprecated at some point in the future. For more information, see IF Function.



For example, you have the following dataset:

MyStringCol
This works.
You can't break this.
Not broken yet.

You test each row for the presence of the string can't:

D code

derive value: if(find(MyStringCol, 'can\'t',true,0) > -1, true, false) as:'MyFindResults'

The above transform results in the following:

MyStringColMyFindResults
This works. 
You can't break this.true
Not broken yet. 

In this case, the value of false is not written to the other columns, since the find function returns a null value. This null value, in turn, nullifies the entire expression, resulting in a null value written in the new column.

You can use the following to locate the null values:

D code

derive value:isnull(MyFindResults) as:'nullInMyFindResults'

Datetime changes

Raw date and time values must be properly formatted

Info

NOTE: Upgraded recipes continue to function properly. However, if you edit the recipe step in an upgraded system, you are forced to fix the formatting issue before saving the change.

 

Before this release, you could create a transform like the following:

D code

derive value:date(2016,2,15)

This transform generated a column of map values, like the following:

Code Block
{"year":"2016","month":"2","date":"15"}

Beginning this release, the above command is invalid, as the date values must be properly formatted prior to display. The following works:

D code

derive value:dateformat(date(2016,2,15),'yyyy-MM-dd')

This transform generates a column of Datetime values in the following format:

Code Block
2016-02-15

Time:

Before this release:

D code

derive value:time(11,34,58)

Prior release output:

Code Block
{"hours":"11","minutes":"34","seconds":"58"}

This release:

D code

derive value:dateformat(time(11,34,58), 'HH-mm-ss')

This release's output:

Code Block
11-34-58

Date formatting functions supports 12-hour time only if AM/PM indicator is included

Beginning in this release, the unixtimeformat and dateformat functions requires an AM/PM indicator (a) if the date formatting string uses a 12-hour time indicator (h or hh).

Valid for earlier releases:

D code

derive value: unixtimeformat(myDate, 'yyyy-MM-dd hh:mm:ss') as:'myUnixDate'

Valid for this release and later:

D code

derive value: unixtimeformat(myDate, 'yyyy-MM-dd hh:mm:ss a') as:'myUnixDate'

These references in recipes fail to validate in this release or later and must be fixed.

Un-inferrable formats from dateformat and unixtimeformat functions are written as strings

If a formatting string is not a datetime format recognized by the 

D s platform
, the output is generated as a string value.

This change was made to provide clarity to some ambiguous conditions.

Colon as a delimiter for date values is no longer supported

Beginning in this release, the colon (:) is no longer supported as a delimiter for date values. It is still supported for time values.

myDateValueRecognized?
02:03:2016No
02:03:16Recognized as a time value

When data such as the above is imported, it may not be initially recognized by the 

D s webapp
 as Datetime type.

To fix, you might apply the following transform:

D code

replace col:myDateValue with:'-' on:`-` global:true

The new column values are more likely to be inferred as Datetime values. If not, you can choose the appropriate Datetime format from the data type drop-down for the column. See Data Grid Panel.