Page tree

Trifacta SaaS


Contents:

 

Contents:


This section applies to getting started with  Trifacta® SaaS, an AWS-native platform for data wrangling. The following following product tiers are available:

  • Trifacta Premium
  • Trifacta Standard 

Trifacta® SaaS enables you to rapidly ingest, transform, and deliver clean, actionable data across your entire enterprise. Please review the following sections on how to prepare for and set up your  Trifacta SaaS workspace.

NOTE: This section applies to both the free version and the licensed version of Trifacta SaaS. For more information on the differences, see Product Limitations.

This section provides an overview of how to get started using the product. 

  1. Administrators should complete the first section to set up the product for use. 
  2. After set up is complete, individual users should complete the second section to get started using the product.

Setup Process

Having difficulties? To speak to a support representative, click the icon in the corner and submit your question.

Steps:

  1. Before you begin. If you are using your own AWS S3 buckets, you should prepare them and their access policies to ensure that  Trifacta SaaS can integrate with them. 

    NOTE: If you do not have these AWS resources, they can be created for you. Details are below.

    1. Technical setup: Please share the technical setup section with your S3 administrator.
  2. Register. Complete the simple online workflow to license and create your  Trifacta SaaS workspace.
  3. Workspace setup. Before you invite other users to your workspace, you should complete a few setup steps.
  4. Invite users. If you intend to share the workspace with other users, you can invite them from within it. 
  5. Wrangle away! 

Before You Begin

Hosted on Amazon Web Services,  Trifacta SaaS is designed to natively interact with AWS datasources, so that you can rapidly transform your data investments in AWS.

AWS Overview

Below are the AWS objects that are required for setup. 

Tip: If you do not have immediate access to these assets, some can be created as part of the workflow setup.


AWS objectRequired?Description
AWS accountY

To create these objects are part of the setup process, you must have an AWS account. For more information, see https://aws.amazon.com/.

Valid email addressYTo validate your registration for a new workspace, you must have a valid email address to which the product can deliver the registration email.
Choice: cross-account role access or key-secret accessY

To integrate with your existing S3 resources, you must choose a method of authentication. Choices:

  • cross-account role: This method uses IAM roles to define the permissions used by the product for S3 access.

    Tip: This method is recommended.

  • key-secret access: This method uses an IAM access keys to provide S3 access.
IAM policyY

An IAM (Identity and Access Management) policy is an AWS resource used to define the low-level permissions for access to a specific resource. During setup, you can use or create a new IAM policy for the product to use for either access method.

For more information, see "Create policy to grant access to S3 bucket" below.

cross-account role access: IAM roleY

An IAM role contains one or more IAM policies that can be used to define the set of available AWS services and the level of access to them for a user. In this case, the user is the Trifacta application.

key-secret access: AWS key-secretYAn older AWS access method, the key-secret combination is essentially a username and password combination to one or more S3 buckets.
S3 bucketYS3 (Simplified Storage Service) is a cloud-based file storage system hosted in AWS. An S3 bucket contains your data files and their organizing folders.
S3 bucket: encryptionN

For better security, your S3 bucket may be encrypted, which means that the data is stored inside of S3 in a way that is not human-readable.

NOTE: The product can optionally integrate with encrypted S3 buckets. The following S3 encryption methods are supported: sse-s3 and sse-kms.

NOTE: If your bucket is encrypted with ss3-kms, additional configuration is required. See "Update policy to accommodate SSE-KMS if necessary" below.

For more information on your bucket's encryption, please contact your S3 administrator.

S3 bucket: storage locationN

If needed, you can change the location where results are stored in S3.

NOTE: The product must have write permission to this location. If you are changing the location from the default, please verify with your S3 administrator that the preferred location is enabled for writing through your access method.

Workspace nameYDuring setup, you must create a unique workspace identifier. This identifier cannot contain spaces or special characters.
IAM role: Account IDN

The account ID identifies in the trust policy that can use your IAM role.

Tip: This identifier is provided to you during registration and setup.

IAM role: External IDN

The external ID identifies in the trust policy that Trifacta SaaS can use your IAM role only on your behalf.

Tip: This identifier is provided to you during registration and setup.

Technical Setup

The following sections should be provided to your AWS administrator for setting up access to these resources, if required.

Create policy to grant access to S3 bucket

To use your own S3 bucket(s) with  Trifacta SaaS, create a policy and assign it to either the user or IAM Role selected to grant access to AWS resources. In this section, you create the policy. Later, it will be applied.

Below is an example policy template. You should use this template to create the policy.

NOTE: You should not simply use one of the predefined AWS policies or an existing policy you have as it will likely give access to more resources than required.

Template Notes:

  1. One of the statements grants access to the trifacta-public-datasets bucket, which contains resources used for the onboarding tour.  
  2. Replace <my_default_S3_bucket> with the name of your default S3 bucket.
  3. To grant access to multiple buckets within your account, you can extend the resources list to accommodate the additional buckets.
Policy Template
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::<my_default_S3_bucket>",
                "arn:aws:s3:::<my_default_S3_bucket>/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::trifacta-public-datasets",
                "arn:aws:s3:::trifacta-public-datasets/*"
            ]
        }
    ]
}

Update policy to accommodate SSE-KMS if necessary

If any accessible bucket is encrypted with SSE-KMS, another policy must be deployed. See https://docs.aws.amazon.com/kms/latest/developerguide/iam-policies.html.

Add policy for Redshift access

If you are connecting to Redshift databases through your workspace, you can enable access by creating a GetClusterCredentials policy. This policy is additive to the the S3 access policies. All of these policies can be captured in a single IAM role. 

Example:

{
"Version": "2012-10-17",
  "Statement": [
    {
     "Sid": "GetClusterCredsStatement",
      "Effect": "Allow",
      "Action": [
        "redshift:GetClusterCredentials"
      ],
      "Resource": [
        "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecluster/${redshift:DbUser}",
        "arn:aws:redshift:us-west-2:123456789012:dbname:examplecluster/testdb",
        "arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecluster/common_group"
      ],
        "Condition": {
            "StringEquals": {
           "aws:userid":"AIDIODR4TAW7CSEXAMPLE:${redshift:DbUser}@yourdomain.com"
                            }
                      }
    },
  }
}

For more information on these permissions, see Required AWS Account Permissions.

Whitelist the IP address range of the Trifacta Service, if necessary

If you are enabling any relational source, including Redshift, you must whitelist the IP address range of the Trifacta service in the relevant security groups.  The IP address range of the Trifacta service is:

35.245.35.240/28

For Redshift:

For Redshift, there are two ways to whitelist the IP range depending on if you are using EC2-VPC or EC2-Classic (not common).

For details on this process with RDS in general, see https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.RDSSecurityGroups.html

For more information, please contact  Trifacta Support.

Register for  Trifacta SaaS

To begin the registration process, please visit https://www.trifacta.com/start-wrangling.

Workspace Setup

After you have completed registration, please login to the application.

NOTE: You can now access online documentation through the application. From the left menu bar, select Help menu > Documentation.

Review Workspace Settings

As the first registered user, you are assigned the workspace admin role, which provides control over workspace-level settings. Before you invite members to the workspace, you should review and modify the basic configuration for the workspace. See Workspace Settings Page.

Verify Operations

NOTE: Workspace administrators should complete the following steps to verify that the product is operational end-to-end.

Prepare Your Sample Dataset

To complete this test, you should locate or create a simple dataset. Your dataset should be created in the format that you wish to test.

Tip: The simplest way to test is to create a two-column CSV file with at least 25 non-empty rows of data. This data can be uploaded through the application.

Characteristics:

  • Two or more columns. 
  • If there are specific data types that you would like to test, please be sure to include them in the dataset.
  • A minimum of 25 rows is required for best results of type inference.
  • Ideally, your dataset is a single file or sheet. 



Verification Steps

Steps:

  1. Login to the application.See Login.

  2. In the application menu bar, click Library.
  3. Click Import Data. See Import Data Page.
    1. Select the connection where the dataset is stored. For datasets stored on your local desktop, click Upload.
    2. Select the dataset.
    3. In the right panel, click the Add Dataset to a Flow checkbox. Enter a name for the new flow.
    4. Click Import and Add to Flow.

  4. In the left menu bar, click the Flows icon. Flows page, open the flow you just created. See Flows Page.
  5. In the Flows page, click the dataset you just imported. Click Add new Recipe.
  6. Select the recipe. Click Edit Recipe.
  7. The initial sample of the dataset is opened in the Transformer page, where you can edit your recipe to transform the dataset.
    1. In the Transformer page, some steps are automatically added to the recipe for you. So, you can run the job immediately.
    2. You can add additional steps if desired. See Transformer Page.
  8. Click Run Job
    1. If options are presented, select the defaults.

    2. To generate results in other formats or output locations, click Add Publishing Destination. Configure the output formats and locations. 
    3. To test dataset profiling, click the Profile Results checkbox. Note that profiling runs as a separate job and may take considerably longer. 
    4. See Run Job Page.

  9. When the job completes, you should see a success message under the Jobs tab in the Flow View page. 
    1. Troubleshooting: Either the Transform job or the Profiling job may break. To localize the problem, try re-running a job by deselecting the broken job type or running the job on a different running environment (if available). You can also download the log files to try to identify the problem. See Job Details Page.
  10. Click View Results from the context menu for the job listing. In the Job Details page, you can see a visual profile of the generated results. See Job Details Page.
  11. In the Output Destinations tab, click a link to download the results to your local desktop. 
  12. Load these results into a local application to verify that the content looks ok.

Checkpoint: You have verified importing from the selected datastore and transforming a dataset. If your job was successfully executed, you have verified that the product is connected to the job running environment and can write results to the defined output location. Optionally, you may have tested profiling of job results. If all of the above tasks completed, the product is operational end-to-end.

Invite Members

  1. You can invite other people to join your workspace. 
    1. When members initially join your workspace, they are assigned a non-admin role. Through the Workspace Members page, you can assign roles.
    2. For more information, see Workspace Users Page.
  2. The workspace administrators must provide credentials for each workspace member account. See Workspace Users Page.

Getting Started for Workspace Members

This section walks through the process of getting started as a new member of a Trifacta SaaS workspace. 

Steps:

  1. You should have received an email like the following:


    Figure: Welcome email

  2. Click the link. If you see a Missing Storage Settings error message, then you must provide your individual user storage credentials and default bucket. To do so, click the Here link.
  3. In your Storage Settings page, you may be required to enter your S3 credentials. After the credentials have been entered, you can begin using the product. 
  4. Access documentation: To access the full customer documentation, from the left nav bar, select Help menu > Documentation.

The following resources can assist workspace members in getting started with wrangling.


  • If product walkthroughs have been enabled, each new member can step through an onboarding tour of the product after first login. 

This page has no comments.