Page tree

 


Contents:


This documentation applies to installation from a supported Marketplace. Please use the installation instructions provided with your deployment.


If you are installing or upgrading a Marketplace deployment, please use the available PDF content. You must use the install and configuration PDF available through the Marketplace listing.

This guide steps through the requirements and process for upgrading  Trifacta® Wrangler from the Azure Marketplace.

Please complete the instructions in this section if you are upgrading from version 5.0 of  Trifacta® Wrangler Enterprise

NOTE: These instructions apply only to Trifacta® Wrangler Enterprise available through the Azure Marketplace.

 

Before You Begin

Before you upgrade, please do the following:

Steps:

  1. Login as an admin to the application.
  2. From the left nav bar, select Settings > Admin Settings.
  3. Locate the following parameter:

    azure.wasb.defaultStore.sasTokenId
  4. Store the current value for safekeeping. It must be applied again to a new parameter after upgrade.

Overview

Upgrading your instance of  Trifacta Wrangler Enterprise for Azure follows these basic steps:

  1. Back up the databases and configuration for your existing platform instance.
  2. Download the latest version of  Trifacta Wrangler Enterprise
  3. Uninstall the existing version of  Trifacta Wrangler Enterprise. Install the version you downloaded in the previous step.
  4. Upgrade the databases.
  5. Start up  Trifacta Wrangler Enterprise. This step automatically upgrades the configurations. 
  6. Within the application, perform required configuration updates in the upgraded instance.

Instructions for these steps are provided below. 

Backup

Before you begin, you should backup your current instance.

  1. SSH to your current Marketplace instance.
  2. Stop the Trifacta platform on your current Marketplace instance:

    sudo service trifacta stop
  3. Update the backup script with a more current version.
    1. If you have not done so already, download the backup script from the following location:

      NOTE: Please modify the following URL for the version from which you are upgrading.


      https://raw.githubusercontent.com/trifacta/trifacta-utils/release/5.0/azure/trifacta-backup-config-and-db.sh

    2. Example command to download the script:

      NOTE: Below, some values are too long for a single line. Single lines that overflow to additional lines are marked with a \. The backslash should not be included if the line is used as input.

      curl --output trifacta-backup-config-and-db.sh \
      https://raw.githubusercontent.com/trifacta/trifacta-utils/release/5.0/azure/trifacta-backup-config-and-db.sh
    3. Overwrite the downloaded script to the following location:

      /opt/trifacta/bin/setup-utils/trifacta-backup-config-and-db.sh
    4. Verify that this script is executable: 

      sudo chmod 775 /opt/trifacta/bin/setup-utils/trifacta-backup-config-and-db.sh
  4. Run the backup script:

    sudo /opt/trifacta/bin/setup-utils/trifacta-backup-config-and-db.sh

     

    1. When the script is complete, the output identifies the location of the backup. Example: opt/trifacta-backups/trifacta-backup-5.0.0+110.20180615100731.e4ded4d-20181211213601.tgz

  5. Store the backup in a safe location. 

Download software

Download the latest version by using the command below:

wget 'https://trifactamarketplace.blob.core.windows.net/artifacts/trifacta-server-5.1.0m5-120~xenial_amd64.deb?sr=c&si=trifacta-deploy-public-read&sig=ksMPhDkLpJYPEXnRNp4vAdo6QQ9ulpP%2BM4Gsi/nea%2Bg%3D&sv=2016-05-31' -O trifacta-server-5.1.0m5-120~xenial_amd64.deb

Uninstall and install software

Uninstall old version

To uninstall older version of  Trifacta® Wrangler Enterprise, execute as root user the following command on the Trifacta node:

apt-get remove --purge trifacta

Upgrade components

Upgrade supervisor package:

apt-get install supervisor=3.2.4

Upgrade for Ubuntu 16.04 (Xenial)

Steps:

  1. Stop the platform:

    service trifacta stop
  2. Install the upgraded version. Below, PostgreSQL 9.6 is installed.

    sudo apt-get install postgresql-9.6 postgresql-server-dev-9.6 postgresql-contrib-9.6 -y
  3. Stop PostgreSQL:

    sudo systemctl stop postgresql
    service stop postgresql
    service postgresql stop
  4. Upgrade the PostgreSQL 9.3 version to PostgreSQL 9.6, using the newly installed version:

    NOTE: Below, some values are too long for a single line. Single lines that overflow to additional lines are marked with a \. The backslash should not be included if the line is used as input.

    sudo su - postgres -c '/usr/lib/postgresql/9.6/bin/pg_upgrade \
    -b /usr/lib/postgresql/9.3/bin -B /usr/lib/postgresql/9.6/bin \
    -d /var/lib/postgresql/9.3/main/ -D /var/lib/postgresql/9.6/main/ \
    -O "-c config_file=/etc/postgresql/9.6/main/postgresql.conf" \
    -o "-c config_file=/etc/postgresql/9.3/main/postgresql.conf"'
  5. Remove the old version of PostgreSQL (9.3):

    sudo apt-get remove postgresql-9.3 -y
  6. Restart PostgreSQL and the platform:

    service postgresql start
    service trifacta start

Install new version

Install the latest version that was downloaded. Execute the following command as root:

dpkg -i <location of the 5.1 deb file>

 

Start up platform

To migrate the DBs and upgrade configs from the older version of  Trifacta® Wrangler Enterprise to the latest version, the platform needs to be started. To start the Trifacta platform on the instance:

service trifacta start

Post-Upgrade Configuration

Upgrade from Release 5.0 or earlier

Change database ports

The database version is PostgreSQL 9.6, which uses a different port number than the previous version. You must verify that the platform is using the appropriate port number to access the databases.

 

Steps:

  1. Login to the Trifacta node.
  2. Edit the following file:

    /etc/postgresql/9.6/main/postgresql.conf
  3. Locate the listed database port number. Typically, this value is 5433.
  4. Close the file.
  5. To apply this configuration change, login as an administrator to the Trifacta node. Then, edit trifacta-conf.json. Some of these settings may not be available through the Admin Settings Page. For more information, see Platform Configuration Methods.
  6. Locate the following parameters, which define the port numbers used by each Trifacta database. The values below are the defaults but may be different:

    NOTE: All databases should use the same port number.

    "webapp.db.port" = 5433;
    ...
    "batch-job-runner.db.port" = 5433;
    ...
    "scheduling-service.database.port" = 5433;
    ...
    "time-based-trigger-service.database.port" = 5433;
  7. The above values should be updated with the value from the postgresql.conf file, if they differ.

  8. Save and close the file.

Reconfigure use of Key Vault for WASB access

NOTE: This section applies only if you are upgrading to Release 5.1 or later.

Beginning in Release 5.1, you can configure the platform to access WASB using configuration stored in the Trifacta configuration, including the SAS token.

In your pre-upgrade environment, if you enabled WASB access through a Key Vault, you must set the following property to true. This property is new in Release 5.1 and defaults to false.

Steps:

  1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.
  2. Locate and set the following property to true:

    "azure.wasb.fetchSasTokensFromKeyVault": true
  3. Locate the value for azure.wasb.defaultStore.sasTokenId from your pre-upgrade version. Set the following property to the saved value:

    "azure.wasb.defaultStore.keyVaultSasTokenSecretName"
  4. Save your changes.

No other configuration is required to enable your previous WASB access.

 

Storage protocol wasb replaced by wasbs

NOTE: This section applies to upgrades to Release 5.1 and later.

Beginning in Release 5.1, the wasb: storage protocol is no longer supported. All interactions with WASB are managed through wasbs: at this point, using the SAS token that must be created for all environments using WASB as the base storage layer.

NOTE: If you created any External Datasources for a SQL DW connection using the wasb: protocol before you upgraded, then you must recreate them using the wasbs: protocol in your upgraded instance. See Create SQL DW Connections.

Upgrade from Release 5.1 or earlier

Relational writeback enabled

NOTE: Upon upgrade to Release 6.0 and later, relational writeback is automatically enabled.

New and existing native platform connections (PostgreSQL, Oracle, SQL Server, and Teradata) are no longer read-only ever.

For more information, see Release Notes 6.0.

Spark using native libraries

For Hortonworks (HDI) 3.x, jobs executed on the Spark running environment now utilize the native libraries in the connected Spark cluster. 

NOTE: The Spark version that is referenced in the platform now must match the Spark version in the cluster. Additional configuration is required.

For more information, see Configure for Spark.

For earlier versions of HDP, the Trifacta platform shipped its local libraries to the cluster for use with the job execution.

Photon scaling factor has been removed

In Release 6.0, the Photon scaling factor parameter (photon.loadScalingFactor) has been removed. As part of this change, the following parameters are automatically set to new values as part of the upgrade. The new values are listed below:

"webapp.client.loadLimit": 10485760,
"webapp.client.maxResultsBytes": 41943040,

NOTE: If you had not modified either of the above values previously, then no action is required. If you had changed these values before upgrading, the settings are set to the new default values above.

Update Data Service properties

After you have upgraded, the Data Service fails to start.

In Release 6.0.0, some configuration files related to the Data Service were relocated, so the classpath values pointing to these files need to be updated. 

Steps:


  1. To apply this configuration change, login as an administrator to the Trifacta node. Then, edit trifacta-conf.json. Some of these settings may not be available through the Admin Settings Page. For more information, see Platform Configuration Methods.
  2. Locate the  data-service.classpath setting. Change the class path value to point to the correct directory:

    /opt/trifacta/conf/data-service
  3. Locate the webapp.connectivity.kerberosDelegateConfigPath setting. If you are enabling Kerberos-based SSO for relational connections, please add the following value to the path:

    "%(topOfTree)s/services/data-service/build/conf/kerberosdelegate.config"

    For more information, see Enable SSO for Relational Connections.

  4. Save the file.

Download and reinstall Tableau SDK

Due to licensing issues, all existing customers who are using the Tableau integration must license, download and install the Tableau SDK if upgrading to Release 6.0.2 and onward.

Update MySQL JAR for Configuration Service

NOTE: This issue applies only if the Trifacta databases have been installed on MySQL. PostgreSQL environment is unaffected.

When upgrading to Release 6.0.x, there is a known issue in which the MySQL driver JAR is not properly installed for the new Configuration Service. This causes a No suitable driver found error for the trifactaconfigurationservice.

The fix is to apply copy the MySQL driver to the correct location for Configuration Service in Release 6.0.x.

Steps:

  1. Login to the Trifacta node.
  2. Locate the MySQL driver. A version of it should be available in one of the following locations:

    /opt/trifacta/services/batch-job-runner/build/install/batch-job-runner/lib/mysql-connector-java-6.0.6.jar
    /opt/trifacta/services/scheduling-service/server/build/install/scheduling-service/lib/mysql-connector-java-6.0.6.jar
    /opt/trifacta/services/time-based-trigger-service/server/build/install/time-based-trigger-service/lib/mysql-connector-java-6.0.6.jar
  3. Relocate the driver to the following location:

    /opt/trifacta/services/configuration-service/build/install/configuration-service/lib/mysql-connector-java-6.0.6.jar

For more information on MySQL installation, see Install Databases for MySQL.

SSO signout updates

After upgrading to this release, signing out of the Trifacta application may not work when SSO is enabled. This issue applies to the reverse proxy method of SSO for AD/LDAP. 

NOTE: Release 6.0 introduces a platform-native integration to enterprise SSO. See Release Notes for details.

Some properties related to the reverse proxy must be updated. Please complete the following:

Steps:

  1. Login to the Trifacta node.
  2. Edit the following file:

    /opt/trifacta/pkg3p/src/tripache/conf/conf.d/trifacta.conf
  3. Add the following rule for the /unauthorized path:

  4. Modify the redirection for /sign-out from / to /unauthorized. Remove the rewrite rule:
  5. Save the file and restart the platform.

Troubleshooting - disambiguating email accounts

In Release 6.0.1, the Trifacta platform permitted case-sensitive email addresses. So for purposes of creating user accounts, the following could be different userIds in the platform. Pre-upgrade, the People table might look like the following:

| <Id> | <Email> | other columns |
| 1 | foobar@trifacta.com | * |
| 2 | FOOBAR@trifacta.com | * |
| 3 | FooBar@trifacta.com | * |

Beginning in Release 6.0.2, all email addresses (userIds) are case-insensitive, so the above distinctions are no longer permitted in the platform. 

As of Release 6.0.2, all email addresses are converted to lower-case. As part of the upgrade to Release 6.0.2, any email addresses that are case-insensitive matches (foobar and FOOBAR) are disambiguated. After upgrade the People table might look like the following:

| <Id> | <Email> | other columns |
| 1 | foobar@trifacta.com_duplicate_1 | * |
| 2 | foobar@trifacta.com_duplicate_2 | * |
| 3 | foobar@trifacta.com | * |

Notes:

  • The email address with the highest Id value in the People table is assumed to be the original user account.
  • The format for email addresses is: 

    <orig_userId>_duplicate_<row_id>

    where <row_id> is the row in the table where the duplicate was detected.

After all migrations have completed, you should review the migration logs. Search the logs for the following: all-emails-for-people-to-lower-case.

A set of users without duplicates has the following entry:

== 20181107131647-all-emails-for-people-to-lower-case: migrating =======
== 20181107131647-all-emails-for-people-to-lower-case: migrated (0.201s)

Entries like the following indicate that duplicate addresses were found for separate accounts. The new duplicate Ids are listed as part of the message:

== 20181107131647-all-emails-for-people-to-lower-case: migrating =======
warn: List of duplicated emails: foobar@trifacta.com_duplicate_1, foobar@trifacta.com_duplicate_2
== 20181107131647-all-emails-for-people-to-lower-case: migrated (0.201s)

NOTE: The above log entry indicates that there are duplicated user accounts.

Suggested approach:

  1. Change ownership on all flows in secondary accounts to the primary account.
  2. Delete secondary accounts.

Troubleshooting - Spark jobs succeed on cluster but appear to fail in the platform

For more information on this known issue, see the Troubleshooting section in Configure for Spark.

Verify 

The upgrade is complete. To verify:

Steps:

  1. Restart the platform:

    sudo service trifacta start
  2. Run a simple job with profiling. 
  3. Verify that the job has successfully completed. 
    1. In the Jobs page, locate the job results. See Jobs Page.
    2. Click View Details next to the job to review the profile.

Documentation

You can access complete product documentation online and in PDF format. From within the product, select Help menu > Product Docs.

After you have accessed the documentation, the following topics are relevant to Azure deployments. Please review them in order.

TopicDescription
Supported Deployment Scenarios for AzureMatrix of supported Azure components.

Configure for Azure

Top-level configuration topic on integrating the platform with Azure.

Tip: You should review this page.

Configure for HDInsight

Review this section if you are integrating the Trifacta platform with a pre-existing HDI cluster.

Enable ADLS AccessConfiguration to enable access to ADLS.
Enable WASB AccessConfiguration to enable access to WASB.
Configure SSO for Azure AD

How to integrate the Trifacta platform with Azure Active Directory for Single Sign-On.



This page has no comments.