Page tree

 

Contents:


Welcome to  Trifacta® Wrangler Pro!

  1. Administrators should complete the first section to set up the product for use. 
  2. After set up is complete, individual users should complete the second section to get started using the product.

Step 1 - Begin process

You can begin using the product in either of the following ways, which are described in the following sections:

Register for Free TrialSign up for a free trial of the product, which provides limited access to the full product. See below.
Create Workspace

If you have licensed the full Trifacta Wrangler Pro product, you begin by submitting a request to Trifacta Support. See below.


Register for Free Trial

To begin the process, an administrator should complete the registration form available here: https://www.trifacta.com/gated-form/free-trial-redshift/.

Limitations:

  • 100 Trifacta Compute Units
  • 10 users

After you submit the registration form, an email is sent to your provided email address to confirm registration. 

NOTE: This process can take up to 24 hours to complete.

Key fields:

FieldDescription
EmailThis email address will receive a registration email, which contains a link that you must follow to complete registration.
Current AWS services utilized

Please add a comma-separated list of the AWS services that are currently used by your organization. Example:

AWS, S3, EC2, Redshift, VPC
Primary AWS region

The region you select should be the same as your S3 and Redshift storage locations, if possible.

NOTE: If you are integrating with Redshift, the region for your Redshift resources must be in the same location as your default S3 bucket, which is specified later.

Create Workspace

When you are ready to create your workspace, please contact  Trifacta Support to create the workspace.

Key considerations:

  • Number of workspace members
  • Data volumes
  • Primary AWS region

After the workspace has been created, an email is sent to your registered address with next steps.

Step 2 - Choose an AWS access mode

Trifacta Wrangler Pro supports the following access modes:

MethodDescription
workspace

Access to AWS resources is granted through a single set of credentials, which are configured by an administrator and shared by everyone in the workspace.

Tip: This configuration is easiest to manage. After the administrator configures credentials, all invited members can immediately access the product. However, all workspace users have the same permissions, which may be problematic for security reasons.

per-user

Each user must enter their own configuration settings in the Storage Config page after login.

Tip: This method is more secure. However, each user must enter his or her own AWS credentials to access the product, which requires extra steps. These steps are described later for non-admin users.

Each of the above modes can be managed through one of the following credential methods:

  • IAM role that provides access to the designated bucket(s)

    Tip: This method is recommended.

  • Access Key / Secret Key pair

    NOTE: For this method, you should create a new service account. Avoid generating credentials using your existing AWS account, since it grants access to more resources than required by the Trifacta service.

Step 3 - Take note of S3 encryption method (if in use)

Trifacta Wrangler Pro supports the following types of encryption.  Review if you have enabled any of the following encryption methods in your S3 environment. 

  • None
  • SSE-S3
  • SSE-KMS

NOTE: If some form of S3 encryption is enabled, additional configuration is required. The method of encryption must be provided to the product to communicate with your S3 resources. If per-user authentication is in use individual users must configure the appropriate setting in their accounts.

Step 4 - Create policy to grant access to S3 bucket

To use your own S3 bucket(s) with  Trifacta Wrangler Pro,create a policy and assign it to either the user or IAM Role selected to grant access to AWS resources.  In this section, you create the policy. Later, it will be applied.

Below is an example policy template. You should use this template to create the policy. 

NOTE: You should not simply use one of the predefined AWS policies or an existing policy you have as it will likely give access to more resources than required.

Template Notes:

  1. One of the statements grants access to the trifacta-public-datasets bucket, which contains resources used for the onboarding tour.  
  2. Replace <my_default_S3_bucket> with the name of your default S3 bucket.
  3. To grant access to multiple buckets within your account, you can extend the resources list to accommodate the additional buckets.

Policy Template

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::<my_default_S3_bucket>",
                "arn:aws:s3:::<my_default_S3_bucket>/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::trifacta-public-datasets",
                "arn:aws:s3:::trifacta-public-datasets/*"
            ]
        }
    ]
}

Step 5 - Update policy to accommodate SSE-KMS if necessary

If any accessible bucket is encrypted with SSE-KMS, another policy must be deployed. See https://docs.aws.amazon.com/kms/latest/developerguide/iam-policies.html.

Step 6 - Add policy for Redshift access

If you are connecting to Redshift databases through your workspace, you can enable access by creating a GetClusterCredentials policy. This policy is additive to the the S3 access policies. All of these policies can be captured in a single IAM role. 

Example:

{
"Version": "2012-10-17",
  "Statement": [
    {
     "Sid": "GetClusterCredsStatement",
      "Effect": "Allow",
      "Action": [
        "redshift:GetClusterCredentials"
      ],
      "Resource": [
        "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecluster/${redshift:DbUser}",
        "arn:aws:redshift:us-west-2:123456789012:dbname:examplecluster/testdb",
        "arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecluster/common_group"
      ],
        "Condition": {
            "StringEquals": {
           "aws:userid":"AIDIODR4TAW7CSEXAMPLE:${redshift:DbUser}@yourdomain.com"
                            }
                      }
    },
  }
}

For more information on these permissions, see Required AWS Account Permissions.

Step 7 - Whitelist the IP address range of the Trifacta Service, if necessary

If you are enabling any relational source, including Redshift, you must whitelist the IP address range of the Trifacta service in the relevant security groups.  The IP address range of the Trifacta service is:

35.245.35.240/28

For Redshift:

For Redshift, there are two ways to whitelist the IP range depending on if you are using EC2-VPC or EC2-Classic (not common).

For details on this process with RDS in general, see https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.RDSSecurityGroups.html

For more information, please contact Trifacta Support.

Step 8 -  Storage configuration for the administrator

  1. Login to your workspace. 
  2. If this is your first login, a message similar to the following is displayed:

    Figure: Missing Storage Settings

  3. Click the here link. 

Define AWS access settings

In the AWS Config page, you can specify the following high-level settings to define your AWS access method. These settings were also listed in Step 2. 

AWS Mode:

ModeDescription
Workspace

In Workspace mode, the workspace administrator applies a single set of AWS credentials for all users in the workspace. These credentials are used by each member of the workspace to authenticate with AWS and to gain access to S3 buckets.

Tip: This mode requires more up-front setup but is easy to manage. However, all members of the workspace have the same set of access controls.

Per User

In Per User mode, individual members of the workspace must apply their AWS credentials to their accounts.

Tip: This mode is easy to set up but turns responsibility for access controls over to the individual members. If members encounter problems gaining access to S3 assets, the workspace administrator may not be able to troubleshoot them.

Credential Provider:

For workspace or per-user mode, the following provider methods can be used to manage authentication with AWS. 

Credential ProviderDescription
IAM Role

Trifacta Wrangler Pro can use any IAM roles that have been defined for workspace members to access AWS data sources, such as S3 and Redshift.

Tip: This credential provider method is recommended.

AWS Key and SecretYou can apply key and secret combinations to gate access to AWS data sources. These combinations can be applied in workspace mode or in per-user mode by individual members.

After you have made your selections for the above settings, you can review the following sections, which contain some common configuration workflows. Please populate the settings according to your needs. 

Common configurations for Workspace mode

Using an IAM role to grant access to your S3 bucket (Recommended method)

SettingValue
ModeWorkspace
Credential ProviderIAM Role


  1. Please retain these two key pieces of information from the screen. These pieces of information must be applied in AWS:
    1. Account ID
    2. External ID
  2. Log into your AWS account and create a new IAM Role:
    1. When you create the role, you receive this prompt: "Select type of trusted entity", choose "Another AWS account".
      1. Enter the Account ID that you acquired from the Trifacta screen.
      2. Select the "Require external ID" checkbox. Enter the External ID provided to you from the Trifacta screen.
    2. Proceed to the Permissions page.
      1. Select the policy that you already created.
    3. Proceed to the Tags page. Enter tags, if desired.
    4. Proceed to the Review page. Select a name for your role.
    5. Finish creating the role.
  3. You must insert a trust relationship in this IAM role. For more information, see Insert Trust Relationship in AWS IAM Role.
  4. Select IAM>Roles. Select your new role, and copy the Role ARN.
  5. In the Trifacta screen
    1. Paste this value into the "Available IAM Role ARNs" textbox and press ENTER.
    2. Enter the name of your default S3 bucket. 

      NOTE: This bucket should already be granted access through the policy that you created.

    3. Select the encryption type.

Using AWS Access Key and Secret Key to grant access to your S3 bucket

SettingValue
ModeWorkspace
Credential ProviderAWS Key and Secret


  1. Log into your AWS account and create a new user.
    1. Select the "Programmatic access" checkbox.
  2. Proceed to the Permissions page.
    1. Select the "Attach existing policies directly" checkbox.
    2. Select the policy that you have already created.
  3. Proceed to the Tags page. Enter tags, if desired.
  4. Copy your Access Key and Secret Key.
  5. Paste these values into the appropriate textboxes in the Trifacta screen.

Common configurations for User mode

  1. For per-user mode, the administrator must still select the encryption type.
  2. When each user logs in, the user must configure their storage settings.

    NOTE: Depending on the required IAM permissions, non-admin users may not be able to complete this configuration without assistance.

Using an IAM role to grant access to your S3 bucket (Recommended method)

SettingValue
ModePer user
Credential ProviderIAM Role


  1. Please retain these two key pieces of information from the screen. These pieces of information must be applied in AWS:
    1. Account ID
    2. External ID
  2. Log into your AWS account and create a new IAM Role:
    1. When you create the role, you receive this prompt: "Select type of trusted entity", choose "Another AWS account".
      1. Enter the Account ID that you acquired from the Trifacta screen.
      2. Select the "Require external ID" checkbox. Enter the External ID provided to you from the Trifacta screen.

        NOTE: In per-user mode, this value is different for each user.

    2. Proceed to the Permissions page.
      1. Select the policy that you already created.
    3. Proceed to the Tags page. Enter tags, if desired.
    4. Proceed to the Review page. Select a name for your role.
    5. Finish creating the role.
  3. You must insert a trust relationship in this IAM role. For more information, see Insert Trust Relationship in AWS IAM Role.
  4. Select IAM>Roles. Select your new role, and copy the Role ARN.
  5. In the Trifacta screen
    1. Paste this value into the "Available IAM Role ARNs" textbox and press ENTER.
    2. Enter the name of your default S3 bucket. 

      NOTE: This bucket should already be granted access through the policy that you created.

Using AWS Access Key and Secret Key to grant access to your S3 bucket

SettingValue
ModePer user
Credential ProviderAWS Key and Secret
  1. Log into your AWS account and create a new user.
    1. Select the "Programmatic access" checkbox.
  2. Proceed to the Permissions page.
    1. Select the "Attach existing policies directly" checkbox.
    2. Select the policy that you have already created.
  3. Proceed to the Tags page. Enter tags, if desired.
  4. Copy your Access Key and Secret Key.
  5. Paste these values into the appropriate textboxes in the Trifacta screen.

Step 9 - Access Documentation

At this point, you can access online documentation for the product.

NOTE: Content referenced in the PDF guide is not accessible through the PDF. You must login to the online documentation to access the referenced pages.

Steps:

  1. From the left navigation bar, select Help menu > Documentation.
  2. You are automatically logged in. 
  3. PDF content is located in the following pages:
    1. Getting Started with Trifacta Wrangler Pro
    2. AWS Config Page
    3. Create Redshift Connections

Initial Configuration

Before you invite members to the workspace, you should review and modify the basic configuration for the workspace. See Workspace Settings Page.

Step 10 - Verify Operations

NOTE: Workspace administrators should complete the following steps to verify that the product is operational end-to-end.

Prepare Your Sample Dataset

To complete this test, you should locate or create a simple dataset. Your dataset should be created in the format that you wish to test.

Tip: The simplest way to test is to create a two-column CSV file with at least 25 non-empty rows of data. This data can be uploaded through the application.

Characteristics:

  • Two or more columns. 
  • If there are specific data types that you would like to test, please be sure to include them in the dataset.
  • A minimum of 25 rows is required for best results of type inference.
  • Ideally, your dataset is a single file or sheet. 


Store Your Dataset

If you are testing an integration, you should store your dataset in the datastore with which the product is integrated.

Tip: Uploading datasets is always available as a means of importing datasets.

 

  • You may need to create a connection between the platform and the datastore.
  • Read and write permissions must be enabled for the connecting user to the datastore.

Verification Steps

Steps:

  1. Login to the application.See Login.

  2. In the application menu bar, click Library.
  3. Click Import Data. See Import Data Page.
    1. Select the connection where the dataset is stored. For datasets stored on your local desktop, click Upload.
    2. Select the dataset.
    3. In the right panel, click the Add Dataset to a Flow checkbox. Enter a name for the new flow.
    4. Click Import and Add to Flow.

  4. In the left menu bar, click the Flows icon. Flows page, open the flow you just created. See Flows Page.
  5. In the Flows page, click the dataset you just imported. Click Add new Recipe.
  6. Select the recipe. Click Edit Recipe.
  7. The initial sample of the dataset is opened in the Transformer page, where you can edit your recipe to transform the dataset.
    1. In the Transformer page, some steps are automatically added to the recipe for you. So, you can run the job immediately.
    2. You can add additional steps if desired. See Transformer Page.
  8. Click Run Job
    1. If options are presented, select the defaults.

    2. To generate results in other formats or output locations, click Add Publishing Destination. Configure the output formats and locations. 
    3. To test dataset profiling, click the Profile Results checkbox. Note that profiling runs as a separate job and may take considerably longer. 
    4. See Run Job Page.

  9. When the job completes, you should see a success message under the Jobs tab in the Flow View page. 
    1. Troubleshooting: Either the Transform job or the Profiling job may break. To localize the problem, try re-running a job by deselecting the broken job type or running the job on a different running environment (if available). You can also download the log files to try to identify the problem. See Job Details Page.
  10. Click View Results from the context menu for the job listing. In the Job Details page, you can see a visual profile of the generated results. See Job Details Page.
  11. In the Output Destinations tab, click a link to download the results to your local desktop. 
  12. Load these results into a local application to verify that the content looks ok.

Checkpoint: You have verified importing from the selected datastore and transforming a dataset. If your job was successfully executed, you have verified that the product is connected to the job running environment and can write results to the defined output location. Optionally, you may have tested profiling of job results. If all of the above tasks completed, the product is operational end-to-end.

Step 11 - Invite Members

  1. You can invite other people to join your workspace. 
    1. When members initially join your workspace, they are assigned a non-admin role. Through the Workspace Members page, you can assign roles.
    2. For more information, see Workspace Users Page.
  2. If you have enabled per-user authentication, credentials must be provided for each workspace member account:
    1. Administrators can apply per-user authentication for individual accounts. See Workspace Users Page.
    2. If individual members need to apply the credentials, the process is the same as for administrators. 
      1. Please share Step 7 (Common configurations - User Mode) with them.
      2. Similar content is also located online: AWS Config Page.

Getting Started for Workspace Members

This section walks through the process of getting started as a new member of a Trifacta Wrangler Pro workspace. 

Steps:

  1. You should have received an email like the following:


    Figure: Welcome email

  2. Click the link. If you see a Missing Storage Settings error message, then you must provide your individual user storage credentials and default bucket. To do so, click the Here link.
  3. In your Storage Settings page, you may be required to enter your S3 credentials. For more information, see Common configurations - User Mode. above.  After the credentials have been entered, you can begin using the product. 
  4. Access documentation: To access the full customer documentation, from the left nav bar, select Help menu > Documentation.

The following resources can assist workspace members in getting started with wrangling.

  • If product walkthroughs have been enabled, each new member can step through an onboarding tour of the product after first login. 

This page has no comments.