This section provides information on improvements to the .
In prior releases, Datetime values in the were written as String values for Parquet outputs.
Beginning in this release, you can optionally enable the generation of Datetime/Timestamp values as the outputs of Datetime columns.
NOTE: To ensure consistency with prior releases, this feature is disabled by default.
For more information on enabling this feature, see Miscellaneous Configuration.
When an dataset is imported into the , a larger volume of data is read from it for the following processes:
NOTE: The following is applied to datasets that do not contain schema information.
NOTE: An increased data volume should result in a more accurate split row and data type inferencing. For pre-existing datasets, this increased volume may result in changes to the row and column type definitions when a dataset is imported.
Tip: For datasets that are demarcated by quoted values, you may experience a change in how columns are typed.
If you notice unexpected changes in column data types or in row splits in your datasets:
splitrowstransform to see if it is splitting the rows appropriately. For more information, see Import Data Page.
As needed, you can modify the limits that the uses during the data type and split row inference processes. For more information, see Configure Type Inference.
In prior releases, Personally Identifiable Information (PII) for social security numbers was identified based only on the length of values, which matched too broadly.
In this release, the constraints on matching of SSN values has been tightened when applied to PII.
Tip: PII detection is applied in generated log entries and in collaborative suggestions. When matching PII patterns are detected in data that is surface in these two areas, a mask is applied over the values for security reasons.
For more information, see Social Security Number Data Type.
For more information, see Data Type Validation Patterns.
In prior releases, PII for credit card numbers was identified base on 16-digit values.
In this release, the matching constraints have been expanded to include 14-digit credit card values.
Also, the constraints around valid 16-digit numbers have been improved with better recognition around values for different types of credit cards. In the following table, you can see lists of valid test numbers for different credit card types and can see how detection of these values has changed between releases.
|Test Number||Credit Card Type||Is Detected 7.4?||Is detected 7.5?|
|2222 4053 4324 8877||Mastercard||No||Yes|
|2222 9909 0525 7051||Mastercard||No||Yes|
|2223 0076 4872 6984||Mastercard||No||Yes|
|2223 5771 2001 7656||Mastercard||No||Yes|
|5105 1051 0510 5100||Mastercard||Yes||Yes|
|5111 0100 3017 5156||Mastercard||Yes||Yes|
|5204 7400 0990 0014||Mastercard||Yes||Yes|
|5420 9238 7872 4339||Mastercard||Yes||Yes|
|5455 3307 6000 0018||Mastercard||Yes||Yes|
|5506 9004 9000 0444||Mastercard||Yes||Yes|
|5553 0422 4198 4105||Mastercard||Yes||Yes|
|5555 5555 5555 4444||Mastercard||Yes||Yes|
|4012 8888 8888 1881||Visa||Yes||Yes|
|4111 1111 1111 1111||Visa||Yes||Yes|
|6011 0009 9013 9424||Discover||Yes||Yes|
|6011 1111 1111 1117||Discover||Yes||Yes|
|3714 496353 98431||American Express||Yes||Yes|
|3782 822463 10005||American Express||Yes||Yes|
|3056 9309 0259 04||Diners||No||Yes|
|3852 0000 0232 37||Diners||No||Yes|
|3530 1113 3330 0000||JCB||Yes||Yes|
|3566 0020 2036 0505||JCB||Yes||Yes|
For more information, see Credit Card Data Type.
For more information, see Data Type Validation Patterns.
In prior releases, when you generated outputs, the typecasting for the outputs was determined by the data types that were inferred by the application. So, if a column contained only "Yes" or "No" values, then the application is likely to have inferred the column data type as Boolean.
The above presented problems for relational sources for the following reasons:
Beginning in this release, the schemas from relational datasources that are ingested to the backend datastore are now used for generated outputs, unless the type was being forcibly set to something else during the recipe step. At the time of original import, the schema of the relational datasource is stored as part of the ingest process; this schema is stored separately.
NOTE: If you created recipe steps that forcibly change a column's data type from within the application to be different from the source data type of your relational source, you may need to revise these recipe steps or remove them altogether.
During publication, maps its internal data types to the data types of the publishing target using an internal mapping per vendor. For more information, see Type Conversions.
Where there are mismatches between inputs and the expected input data type, the following values are generated for the mismatches:
|Source data type||Output if mismatched|
Primitive data types:
|null value, if mismatched|
|Datetime||null value, if mismatched|
Other non-primitive data types, including:
|Converted to string values, if mismatched|
|String||Anything can be a String value.|
State values and custom data types are converted to string values, if they are mismatched.
The running environment has been augmented to use three-value logic for null values.
When values are compared, the result can be
false in most cases.
If a null value was compared to a null value in the running environment:
This change aligns the behavior of the running environment with that of SQL and Hadoop Pig.
Assume that the column
nuller contains null values and that you have the following transform:
derive value:(nuller >= 0)
Prior to Release 3.1, the above transform generated a column of
In Release 3.1 and later, the transform generates a column of null values.
In the following example,
a_null_expression always evaluates to a null value.
derive value: (a_null_expression ? 'a' : 'b')
In Release 3.0, this expression generated
b for all inputs on the running environment and a null value on Hadoop Pig.
In Release 3.1 and later, this expression generates a null value for all inputs on both running environments.
Tip: Beginning in Release 3.1, you can use the
For example, you have the following dataset:
|You can't break this.|
|Not broken yet.|
You test each row for the presence of the string
derive value: if(find(MyStringCol, 'can\'t',true,0) > -1, true, false) as:'MyFindResults'
The above transform results in the following:
|You can't break this.||true|
|Not broken yet.|
In this case, the value of
false is not written to the other columns, since the
find function returns a null value. This null value, in turn, nullifies the entire expression, resulting in a null value written in the new column.
You can use the following to locate the null values:
derive value:isnull(MyFindResults) as:'nullInMyFindResults'
NOTE: Upgraded recipes continue to function properly. However, if you edit the recipe step in an upgraded system, you are forced to fix the formatting issue before saving the change.
Before this release, you could create a transform like the following:
This transform generated a column of map values, like the following:
Beginning this release, the above command is invalid, as the date values must be properly formatted prior to display. The following works:
This transform generates a column of Datetime values in the following format:
Before this release:
Prior release output:
derive value:dateformat(time(11,34,58), 'HH-mm-ss')
This release's output:
Beginning in this release, the
dateformat functions requires an AM/PM indicator (
a) if the date formatting string uses a 12-hour time indicator (
Valid for earlier releases:
derive value: unixtimeformat(myDate, 'yyyy-MM-dd hh:mm:ss') as:'myUnixDate'
Valid for this release and later:
derive value: unixtimeformat(myDate, 'yyyy-MM-dd hh:mm:ss a') as:'myUnixDate'
These references in recipes fail to validate in this release or later and must be fixed.
If a formatting string is not a datetime format recognized by , the output is generated as a string value.
This change was made to provide clarity to some ambiguous conditions.
Beginning in this release, the colon (
:) is no longer supported as a delimiter for date values. It is still supported for time values.
|02:03:16||Recognized as a time value|
When data such as the above is imported, it may not be initially recognized by the as Datetime type.
To fix, you might apply the following transform:
replace col:myDateValue with:'-' on:`-` global:true
The new column values are more likely to be inferred as Datetime values. If not, you can choose the appropriate Datetime format from the data type drop-down for the column. See Data Grid Panel.