Page tree

Trifacta Dataprep


If you licensed Dataprep by Trifacta before Oct. 14, 2020, you are using the Dataprep by Trifacta Legacy product edition. On October 14, 2022, this product edition will be decommissioned by Google and will be no longer available for use. Current customers of this product edition are encouraged to transition to one of the product editions hosted by Trifacta. See Product Editions.


Collation refers to the organizing of written content into a standardized order. String comparison functions utilize collation rules for Latin. A summary of the rules:

  • Comparisons are case-sensitive.
    • Uppercase letters are greater than lowercase versions of the same letter.
    • However, lowercase letters that are later in the alphabet are greater than the uppercase version of the previous letter.
  • Two strings are equal if they match identically.
    • If two strings are identical except that the second string contains one additional character at the end, the second string is greater.
  • A normalized version of a letter is the unaccented, lowercase version of the letter. In string comparison, it is the lowest value of all of its variants.
    • a is less than ă.
    • However, when compared to b, a = ă.
    • The set of Latin normalized characters contains more than 26 characters.

This table illustrates some generalized rules of Latin collation.

OrderDescriptionLesser ExampleGreater Example
5 Ab


NOTE: In the following set of charts (linked below), the values at the top of the page are lower than the values listed lower on the page. Similarly, the charts listed in the left nav bar are listed in ascending order.

For more information on the applicable collation rules, see


This page has no comments.