Page tree

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Current »

 

Contents:


This guide steps through the requirements and process for upgrading  Trifacta® Wrangler Enterprise through the AWS Marketplace. 

Upgrade

Process preview

Upgrading to the latest version of Trifacta Wrangler Enterprise via the AWS Marketplace terminates your existing instance and creates a new instance with the latest software in its place. Please follow these instructions carefully.

Your Trifacta instance is deployed as part of a Cloudformation Stack. When you upgrade this Stack, Cloudformation informs you of the resources that it plans to modify and then manages the modifications.

  • If there is any problem doing these modifications, CloudFormation automatically rolls back its changes.
  • Your existing instance is preserved until the new one has been brought up successfully.

Upgrade flow:

  1. Review the Upgrade Prep before beginning.
  2. Perform two different backups.
  3. Perform the Cloudformation Stack upgrade, which makes the required changes to your environment, including bringing up a new instance with the latest version of Trifacta Wrangler Enterprise.
  4. Restore your backups onto the new version of the product.
  5. Perform any required changes to complete the upgrade.

Supported paths

This upgrade process supports upgrade for the following versions:

Source VersionTarget Upgrade Version

Trifacta Wrangler Enterprise 5.1

Trifacta Wrangler Enterprise 6.0.2

If you are upgrading from a version that is earlier than the supported Source Version listed above for this upgrade process, please use the links below to acquire the AMI(s) and documentation to upgrade to the earliest supported Source Version. Then, return to these instructions to complete the upgrade to this version.

Your VersionTarget VersionAMIDocumentation

Trifacta Wrangler Enterprise 4.2.x

Trifacta Wrangler Enterprise 5.0.x

Please see the AWS Marketplace listing for the product. The AMI is accessible from there.Trifacta Install Guide for AWS Marketplace with EMR v5.0

Upgrade Prep

  • You must copy your license file back onto the server after the upgrade. You should either back it up or copy the contents of /opt/trifacta/license/license.json to restore it after the upgrade.
  • This process requires broad permissions on your AWS account. If you do not have Administrator access, you may encounter errors when Cloudformation tries to modify AWS objects like IAM or Security Groups.
  • The EMR cluster is replaced with a new one featuring autoscaling groups and configurable sizing.
  • If you have made additional tweaks to the default installation, these changes are likely to be lost. Please review and note any changes, so you can replicate them after upgrade:
    • IAM role or policy changes
    • Changes on the Trifacta Server OS itself
    • SSL certificates and configuration
    • The IP address of your Trifacta Server will change. This change requires a DNS update after the upgrade is complete.
    • Your Trifacta license file

Backup

This process creates two backups: one of the Trifacta software and one of the entire EC2 Instance.

  1. SSH to your current Marketplace instance. Example:

    ssh -i MyKey.pem centos@TrifactaServer.MyCompany.net
  2. Switch to Root user on the server:

    sudo su
  3. Stop the  Trifacta platform on your current Marketplace instance:

    service trifacta stop
  4. Run the Trifacta backup script:

    /opt/trifacta/bin/setup-utils/trifacta-backup-config-and-db.sh
    1. When the script is complete, the output identifies the location of the backup. Example:

      /opt/trifacta-backups/trifacta-backup-5.0+126.20171217124021.a8ed455-20180514213601.tgz
  5. Store the backup in a safe location. You should copy it to either S3 or to your local computer via SCP.

    1. To copy the backup to the S3 bucket used by your installation, you can use this example:

      aws s3 cp /opt/trifacta-backups/trifacta-backup-5.0+126.20171217124021.a8ed455-20180514213601.tgz s3://<my-trifacta-s3-bucket>/trifacta-backups/
    2. If you choose to use SCP, please note that the AMI does not allow root login. You must copy the files to the CentOS user's home directory and modify any permissions to allow the CentOS user to read them. After that, you can use SCP to copy the files. Example:

      On the server: chown centos:centos mybackupfile.tgz
      On your workstation: scp -i <my-key.pem> centos@<my-server-ip>:./example-file.txt ./
  6. You should also take a snapshot of the EBS volume backing your EC2 instance.
    1. This backup is not necessary to restore the Trifacta platform, but it can be useful if you find that you had additional files or configurations to replicate on your new Trifacta instance.


Perform the Cloudformation Stack upgrade

Please complete the following steps to upgrade the Cloudformation Stack. These steps create an instance of the latest version of Trifacta Wrangler Enterprise and then performs any necessary adjustments to your existing resources.

Steps:

  1. Visit the Marketplace listing page for your product.

  2. Under View Usage Instructions, expand the View CloudFormation Template section.
  3. Right-click Download CloudFormation Template. Copy its URL.
  4. In your AWS Console, go to CloudFormation.
  5. Select your Trifacta Stack. Click Update.
  6. Select Replace Current Template.
  7. Select Amazon S3 URL, and paste the link to the template in the textbox.
  8. Click Next to review the parameters. In the latest version of the CloudFormation template we have updated the EMR cluster with some new features:

    1. Configurable autoscaling groups
    2. Configurable instance sizes
    3. Defaults are provided, but feel free to modify these values as necessary.
  9. Wait until the CloudFormation Stack indicates the upgrade is finished.

Restore your backup onto the new instance

  1. Connect to your new Trifacta instance via SSH.

    1. If you receive an SSH fingerprint warning, it is expected when connecting to a new instance.

    2. To suppress this warning, remove the relevant entry from the following file: ~/.ssh/known_hosts.
  2. Switch to Root user on the Trifacta instance:

    sudo su
    cd
  3. Restore your license file or create a new license file with the contents you copied earlier. Review and update the permissions and ownership appropriately:

    chown trifacta:trifacta /opt/trifacta/license/license.json
    chmod 755 /opt/trifacta/license/license.json
  4. Download the latest copy of the restore script to pick up additional fixes that have been made.

    NOTE: This step is required.

    curl --output trifacta-restore-from-backup.sh https://raw.githubusercontent.com/trifacta/trifacta-utils/release/6.0/trifacta-restore-from-backup.sh
    
    mv trifacta-restore-from-backup.sh /opt/trifacta/bin/setup-utils/trifacta-restore-from-backup.sh
    
    chown trifacta:trifacta /opt/trifacta/bin/setup-utils/trifacta-restore-from-backup.sh
    chmod 775 /opt/trifacta/bin/setup-utils/trifacta-restore-from-backup.sh
  5. Download the backup from your storage location and extract its contents. Example:

    mkdir -p /root/trifacta-restore-files
    cd /root/trifacta-restore-files
    aws s3 cp s3://<my-trifacta-s3-bucket>/trifacta-backups/<my-backup-file.tgz> .
    tar xzf <my-backup-file.tgz>
  6. Execute the restore script. Pass in the path to your unzipped backup as a parameter. Example:

    /opt/trifacta/bin/setup-utils/trifacta-restore-from-backup.sh -r /root/trifacta-restore-files/trifacta-backup-5.1+126.20171217124021.a8ed455-20180514213601
  7. Start up the platform:

    service trifacta start
  8. Login to the Trifacta application.
  9. In the menu, navigate to Settings > Admin Settings.

    1. In the External Service Settings area, update your EMR cluster ID. This value is available from the Outputs tab.
    2. Under the Platform Settings area, search for spark.version. Verify that it is 2.3.0.
  10. In some circumstances, old EMR support .jars files are not correctly overwritten by the Trifacta software, leading to EMR job failures. These should be remove to ensure that there are no problems:
    1. In the AWS S3 console, navigate to the following:

      <Resource Bucket>/<Resource Path>/trifacta/libs/
    2. These are located in the External Service Settings section.
    3. You should see about 6 files. Select all the files within this folder and delete them.
    4. When a job is next run in the Trifacta platform, these files are replaced.
  11. Verify that the product is working as expected by running jobs.

Appendix - Post-Upgrade Fixes

Upgrade from Release 5.1 or earlier

Relational writeback enabled

NOTE: Upon upgrade to Release 6.0 and later, relational writeback is automatically enabled.


  • New and existing native platform connections (PostgreSQL, Oracle, SQL Server, and Teradata) are no longer read-only.
  • Existing relational connections are not automatically enabled with writeback.

Photon scaling factor has been removed

Applies if: You modified the Photon scaling properties in your pre-upgrade environment.

In Release 6.0, the Photon scaling factor parameter (photon.loadScalingFactor) has been removed. As part of this change, the following parameters are automatically set to new values as part of the upgrade. The new values are listed below:

"webapp.client.loadLimit": 10485760,
"webapp.client.maxResultsBytes": 41943040,

NOTE: If you had not modified either of the above values previously, then no action is required. If you had changed these values before upgrading, the settings are set to the new default values above.

Update Data Service properties

Applies if: You modified the Data Service classpath in your pre-upgrade environment.


After you have upgraded, the Data Service fails to start.

In Release 6.0.0, some configuration files related to the Data Service were relocated, so the classpath values pointing to these files need to be updated. 

Steps:


  1. To apply this configuration change, login as an administrator to the Trifacta node. Then, edit trifacta-conf.json. Some of these settings may not be available through the Admin Settings Page. For more information, see Platform Configuration Methods.
  2. Locate the  data-service.classpath setting. Change the class path value to point to the correct directory:

    /opt/trifacta/conf/data-service
  3. Locate the webapp.connectivity.kerberosDelegateConfigPath setting. If you are enabling Kerberos-based SSO for relational connections, please add the following value to the path:

    "%(topOfTree)s/services/data-service/build/conf/kerberosdelegate.config"

    For more information, see Enable SSO for Relational Connections.

  4. Save the file.

Download and reinstall Tableau SDK

Applies if: You were using or plan to use the Tableau Server integration.

Due to licensing issues, all existing customers who are using the Tableau integration must license, download and install the Tableau SDK if upgrading to Release 6.0.2 and onward.

Update MySQL JAR for Configuration Service

Applies if: You have installed the Trifacta databases on MySQL.

When upgrading to Release 6.0.x, there is a known issue in which the MySQL driver JAR is not properly installed for the new Configuration Service. This causes a No suitable driver found error for the trifactaconfigurationservice.

The fix is to apply copy the MySQL driver to the correct location for Configuration Service in Release 6.0.x.

Steps:

  1. Login to the Trifacta node.
  2. Locate the MySQL driver. A version of it should be available in one of the following locations:

    /opt/trifacta/services/batch-job-runner/build/install/batch-job-runner/lib/mysql-connector-java-6.0.6.jar
    /opt/trifacta/services/scheduling-service/server/build/install/scheduling-service/lib/mysql-connector-java-6.0.6.jar
    /opt/trifacta/services/time-based-trigger-service/server/build/install/time-based-trigger-service/lib/mysql-connector-java-6.0.6.jar
  3. Relocate the driver to the following location:

    /opt/trifacta/services/configuration-service/build/install/configuration-service/lib/mysql-connector-java-6.0.6.jar

For more information on MySQL installation, see Install Databases for MySQL.

SSO signout updates

Applies if: You have enabled SSO for AD/LDAP and have noticed that logout is not working.

After upgrading to this release, signing out of the Trifacta application may not work when SSO is enabled. This issue applies to the reverse proxy method of SSO for AD/LDAP. 

NOTE: Beginning in Release 6.0, a platform-native method of SSO is available. This new method is recommended.

Some properties related to the reverse proxy must be updated. Please complete the following:

Steps:

  1. Login to the Trifacta node.
  2. Edit the following file:

    /opt/trifacta/pkg3p/src/tripache/conf/conf.d/trifacta.conf
  3. Add the following rule for the /unauthorized path:

  4. Modify the redirection for /sign-out from / to /unauthorized. Remove the rewrite rule:
  5. Save the file and restart the platform.

Merge custom relational configurations

Applies if: You customized any relational connections through modifications of the configuration files in the pre-upgrade environment.

If you have created custom relational connections, the configurations need to be re-applied to the default configurations in the upgraded software. Depending on the pre- and post-upgrade versions, the properties and expected values may have changed significantly. 

Error - cannot start the platform on EC2 instance after upgrade

Applies if: Your EC2 installation is integrated with a PostgreSQL database.

If you have upgraded the  Trifacta platform on an EC2 instance to Release 6.0.0 or later, the platform may not start due to a problem authenticating to the Trifacta database when it is hosted on PostgreSQL.

The problem is that the password of the Trifacta database, the main one, was reset to the EC2 instance ID. 

The solution is to reset the password to trifacta for the trifacta user, which is the default password. At the command line, execute the following:

NOTE: A DBA should execute the following. The database, user, password values can be modified as needed. You should avoid using default passwords for your databases.

psql -d trifacta -U trifacta pwd: <instanceID> \password trifacta


After the password is reset, the platform should restart successfully.

Troubleshooting - disambiguating email accounts

Applies if: Applies to all upgrades to Release 6.0.1 and later.

In Release 6.0.1, the Trifacta platform permitted case-sensitive email addresses. So for purposes of creating user accounts, the following could be different userIds in the platform. Pre-upgrade, the People table might look like the following:

| <Id> | <Email> | other columns |
| 1 | foobar@trifacta.com | * |
| 2 | FOOBAR@trifacta.com | * |
| 3 | FooBar@trifacta.com | * |

Beginning in Release 6.0.2, all email addresses (userIds) are case-insensitive, so the above distinctions are no longer permitted in the platform. 

As of Release 6.0.2, all email addresses are converted to lower-case. As part of the upgrade to Release 6.0.2, any email addresses that are case-insensitive matches (foobar and FOOBAR) are disambiguated. After upgrade the People table might look like the following:

| <Id> | <Email> | other columns |
| 1 | foobar@trifacta.com_duplicate_1 | * |
| 2 | foobar@trifacta.com_duplicate_2 | * |
| 3 | foobar@trifacta.com | * |

Notes:

  • The email address with the highest Id value in the People table is assumed to be the original user account.
  • The format for email addresses is: 

    <orig_userId>_duplicate_<row_id>

    where <row_id> is the row in the table where the duplicate was detected.

After all migrations have completed, you should review the migration logs. Search the logs for the following:  all-emails-for-people-to-lower-case .

A set of users without duplicates has the following entry:

== 20181107131647-all-emails-for-people-to-lower-case: migrating =======
== 20181107131647-all-emails-for-people-to-lower-case: migrated (0.201s)

Entries like the following indicate that duplicate addresses were found for separate accounts. The new duplicate Ids are listed as part of the message:

== 20181107131647-all-emails-for-people-to-lower-case: migrating =======
warn: List of duplicated emails: foobar@trifacta.com_duplicate_1, foobar@trifacta.com_duplicate_2
== 20181107131647-all-emails-for-people-to-lower-case: migrated (0.201s)

NOTE: The above log entry indicates that there are duplicated user accounts.

Suggested approach:

  1. Change ownership on all flows in secondary accounts to the primary account.

  2. Delete secondary accounts.

Troubleshooting - Spark jobs succeed on cluster but appear to fail in the platform

Applies if: Your Hadoop cluster sends trash files to an encrypted zone.

For more information on this known issue, see the Troubleshooting section in Configure for Spark.

Documentation

You can access complete product documentation online and in PDF format. From within the product, select Help menu > Product Docs.

  • No labels

This page has no comments.