This section provides information on improvements to the Trifacta® type system.

Release 8.2


Release 8.0

Data type inference and row split inference run on more data

When an dataset is imported into the Trifacta application, a larger volume of data is read from it for the following processes:

NOTE: The following is applied to datasets that do not contain schema information.

  • Split row inference: Patterns in the data are used to determine the end of a row of data. When a larger volume of data is read, there should be more potential rows to review, resulting in better precision on where to split the data into separate rows in the application.
  • Type inference: Patterns of data in the same column are used to determine the best Trifacta data type to assign to the imported dataset. A larger volume of data means that the application has more values for the same column from which to infer the appropriate data type.

NOTE: An increased data volume should result in a more accurate split row and data type inferencing. For pre-existing datasets, this increased volume may result in changes to the row and column type definitions when a dataset is imported.

Tip: For datasets that are demarcated by quoted values, you may experience a change in how columns are typed.

If you notice unexpected changes in column data types or in row splits in your datasets:

  1. Type inference: You should move your recipe panel cursor to the top of the dataset to see if you must reassign data types.
  2. Split row inference: Create a new imported dataset, disabling type inference in the import settings. Check the splitrows transform to see if it is splitting the rows appropriately. For more information, see Import Data Page.

Release 7.5

PII - Improved matching for social security numbers

In prior releases, Personally Identifiable Information (PII) for social security numbers was identified based only on the length of values, which matched too broadly.

In this release, the constraints on matching of SSN values has been tightened when applied to PII. 

Tip: PII detection is applied in generated log entries and in collaborative suggestions. When matching PII patterns are detected in data that is surface in these two areas, a mask is applied over the values for security reasons.

For more information, see Social Security Number Data Type.

For more information, see Data Type Validation Patterns.

PII - Improved and expanded matching for credit card numbers

In prior releases, PII for credit card numbers was identified base on 16-digit values. 

In this release, the matching constraints have been expanded to include 14-digit credit card values. 

Also, the constraints around valid 16-digit numbers have been improved with better recognition around values for different types of credit cards. In the following table, you can see lists of valid test numbers for different credit card types and can see how detection of these values has changed between releases.

Test NumberCredit Card TypeIs Detected 7.4?Is detected 7.5?
2222 4053 4324 8877MastercardNoYes
2222 9909 0525 7051MastercardNoYes
2223 0076 4872 6984MastercardNoYes
2223 5771 2001 7656MastercardNoYes
5105 1051 0510 5100MastercardYesYes
5111 0100 3017 5156MastercardYesYes
5204 7400 0990 0014MastercardYesYes
5420 9238 7872 4339MastercardYesYes
5455 3307 6000 0018MastercardYesYes
5506 9004 9000 0444MastercardYesYes
5553 0422 4198 4105MastercardYesYes
5555 5555 5555 4444MastercardYesYes
4012 8888 8888 1881VisaYesYes
4111 1111 1111 1111VisaYesYes
6011 0009 9013 9424DiscoverYesYes
6011 1111 1111 1117DiscoverYesYes
3714 496353 98431American ExpressYesYes
3782 822463 10005American ExpressYesYes
3056 9309 0259 04DinersNoYes
3852 0000 0232 37DinersNoYes
3530 1113 3330 0000JCBYesYes
3566 0020 2036 0505JCBYesYes

For more information, see Credit Card Data Type.

For more information, see Data Type Validation Patterns.

