Page tree

Trifacta SaaS



Contents:


   

Contents:


When you sign up to use  Trifacta Trifacta, you are provided a default storage environment which you can use to immediately get started using the product. A storage environment is used to store your data assets. 

Trifacta SaaS supports multiple storage environments, one of which is the default one. The default storage environment provides all of the storage capabilities of other storage environment, as well as storage for data assets generated by use of the product. In the following table, you can see the types of data assets that are stored in each type of storage environment:

Asset TypeDescriptionDefault StorageNon-Default Storage
imported datasets

When you upload data to the product, it is stored in the default storage environment. From these uploaded assets, you create imported datasets, which are sources of data for your transformation recipes.

You can also import data that is stored in other storage environments.

YesYes
job resultsWhen you run a job to transform your data, the results of the job execution are stored in a storage environment.YesYes
samples

When you create transformation recipes, you are working on a sample of the source data.

  • When you first open a recipe, an initial sample of the recipe is created for you and stored on the default storage environment.
  • As needed, you can start a job to create a new sample of your data. Ad-hoc samples are also stored on the default storage environment.
  • For more information, see Overview of Sampling.
YesNo
temporary files

During ingestion of data and job execution, Trifacta SaaS requires storage space in the default storage environment to store temporary files.

YesNo

For more information on these assets, see Glossary.

Storage Environment Options

You can use either of the following storage environments as the default or secondary storage environment.

  • When the product is first launched, TFS is the default storage environment. You must configure access to your S3 buckets and assets.
  • One of the following must be set as the default storage environment.

    Tip: You can switch between TFS and S3 as your default storage environment at any time without disruption to service or loss of data.

Storage EnvironmentDescription

TFS

Short for Trifacta File Storage, this S3-backed storage environment is managed by Trifacta and requires no additional configuration to manage. It is available as soon as your launch the product for the first time. Details are below.

S3

Your S3 buckets and their assets.

NOTE: To access your S3 assets, you must provide authentication credentials, policies, and other configuration information to the Trifacta application. Additional information is provided below.

TFS Storage Environment

When  Trifacta SaaS is first launched, TFS is  defined as the default storage environment. This storage environment provides storage for the above data asset types. Trifacta File Storage  is backed by AWS S3 buckets hosted by Trifacta and secured by IAM policies.

Using  TFS  is very similar to navigating S3 buckets to find and select assets to import or to locate job results that you have published. For more information, see  Using TFS .

AWS Configuration Information

If you plan to use S3 as the default storage environment, the following sections outline the AWS configuration prerequisites and requirements.

AWS Overview

Below are the AWS objects that are required for S3 setup. 

AWS objectRequired?Description
AWS accountY

To create these objects are part of the setup process, you must have an AWS account. For more information, see https://aws.amazon.com/.

Valid email addressYTo validate your registration for a new workspace, you must have a valid email address to which the product can deliver the registration email.
Choice: cross-account role access or key-secret accessY

To integrate with your existing S3 resources, you must choose a method of authentication. Choices:

  • cross-account role: This method uses IAM roles to define the permissions used by the product for S3 access.

    Tip: This method is recommended.

  • key-secret access: This method uses an IAM access keys to provide S3 access.
IAM policyY

An IAM (Identity and Access Management) policy is an AWS resource used to define the low-level permissions for access to a specific resource. You can use an IAM policy for the product to use for either access method.

For more information, see "Create policy to grant access to S3 bucket" below.

cross-account role access: IAM roleY

An IAM role contains one or more IAM policies that can be used to define the set of available AWS services and the level of access to them for a user. In this case, the user is the Trifacta application.

key-secret access: AWS key-secretYAn older AWS access method, the key-secret combination is essentially a username and password combination to one or more S3 buckets.
S3 bucketYS3 (Simplified Storage Service) is a cloud-based file storage system hosted in AWS. An S3 bucket contains your data files and their organizing folders.
S3 bucket: encryptionN

For better security, your S3 bucket may be encrypted, which means that the data is stored inside of S3 in a way that is not human-readable.

NOTE: The product can optionally integrate with encrypted S3 buckets. The following S3 encryption methods are supported: sse-s3 and sse-kms.

NOTE: If your bucket is encrypted with ss3-kms, additional configuration is required. See "Update policy to accommodate SSE-KMS if necessary" below.

For more information on your bucket's encryption, please contact your S3 administrator.

S3 bucket: storage locationN

If needed, you can change the location where results are stored in S3.

NOTE: The product must have write permission to this location. If you are changing the location from the default, please verify with your S3 administrator that the preferred location is enabled for writing through your access method.

IAM role: Account IDN

The account ID identifies in the trust policy that can use your IAM role.

Tip: This identifier is provided to you during registration and setup.

IAM role: External IDN

The external ID identifies in the trust policy that Trifacta SaaS can use your IAM role only on your behalf.

Tip: This identifier is provided to you during registration and setup.

Technical Setup

The following sections should be provided to your AWS administrator for setting up access to these resources, if required.

Create policy to grant access to S3 bucket

To use your own S3 bucket(s) with  Trifacta SaaS, create a policy and assign it to either the user or IAM Role selected to grant access to AWS resources. In this section, you create the policy. Later, it will be applied.

Below is an example policy template. You should use this template to create the policy.

NOTE: You should not simply use one of the predefined AWS policies or an existing policy you have as it will likely give access to more resources than required.

Template Notes:

  1. One of the statements grants access to the public demo asset buckets.  
  2. Replace <my_default_S3_bucket> with the name of your default S3 bucket.
  3. To grant access to multiple buckets within your account, you can extend the resources list to accommodate the additional buckets.
Policy Template
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::<my_default_S3_bucket>",
                "arn:aws:s3:::<my_default_S3_bucket>/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::aws-saas-samples-prod",
                "arn:aws:s3:::aws-saas-samples-prod/*",
                "arn:aws:s3:::aws-saas-datasets",
                "arn:aws:s3:::aws-saas-datasets/*",
                "arn:aws:s3:::3fac-data-public",
                "arn:aws:s3:::3fac-data-public/*"
                "arn:aws:s3:::trifacta-public-datasets",
                "arn:aws:s3:::trifacta-public-datasets/*"
            ]
        }
    ]
}

Update policy to accommodate SSE-KMS if necessary

If any accessible bucket is encrypted with SSE-KMS, another policy must be deployed. See https://docs.aws.amazon.com/kms/latest/developerguide/iam-policies.html.

Add policy for Redshift access

If you are connecting to Redshift databases through your workspace, you can enable access by creating a GetClusterCredentials policy. This policy is additive to the the S3 access policies. All of these policies can be captured in a single IAM role. 

Example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "GetClusterCredsStatement",
      "Effect": "Allow",
      "Action": [
        "redshift:GetClusterCredentials"
      ],
      "Resource": [
        "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecluster/${redshift:DbUser}",
        "arn:aws:redshift:us-west-2:123456789012:dbname:examplecluster/testdb",
        "arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecluster/common_group"
      ],
      "Condition": {
        "StringEquals": {
          "aws:userid": "AIDIODR4TAW7CSEXAMPLE:${redshift:DbUser}@yourdomain.com"
        }
      }
    }
  ]
}

For more information on these permissions, see Required AWS Account Permissions.

Whitelist the IP address range of the Trifacta Service, if necessary

If you are enabling any relational source, including Redshift, you must whitelist the IP address range of the Trifacta service in the relevant security groups.  

NOTE: The database to which you are connecting must be available from the Trifacta service over the public Internet.

The IP address range of the Trifacta service is:

35.245.35.240/28

For Redshift:

For Redshift, there are two ways to whitelist the IP range depending on if you are using EC2-VPC or EC2-Classic (not common).

For details on this process with RDS in general, see https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.RDSSecurityGroups.html

For more information, please contact  Trifacta Support.

Add S3 as a Storage Environment

When TFS is the default storage environment, you can add S3 as a secondary storage environment.

NOTE: Before you begin, please verify that you have acquired the AWS Configuration Information above.


Please complete the following.

Steps:

  1. Login as a workspace administrator.
  2. You apply this change through the Workspace Settings Page. For more information, see Platform Configuration Methods.
  3. Locate the following setting and set it to Enabled:

    Enable S3 connectivity
  4. Configure the workspace AWS authentication. In the Trifacta application, select User menu > Admin console > AWS Settings. See Configure Your Access to S3.
    1. If you have enabled per-user authentication to AWS, individual users must configure their own AWS authentication. See Storage Config Page

Use S3 as Default Storage Environment

You can configure S3 to be the default storage environment, instead of TFS.

NOTE: If you are switching the default storage environment, any data that was written to another default storage environment is still available through the application, as long as you do not disable access to the storage environment. There should be no interruption of service or loss of access to your data.


NOTE: Before you begin, please verify that you have acquired the AWS Configuration Information above.

Please complete the following.

Steps:

  1. You apply this change through the Workspace Settings Page. For more information, see Platform Configuration Methods.
  2. If you have not done so already, enable access to S3: 

    1. Locate the following setting and verify that it is set to Enabled:

      Enable S3 connectivity
    2. Configure the workspace AWS authentication. In the  Trifacta application , select  User menu > Admin console > AWS Settings . See  Configure Your Access to S3 .
      1. If you have enabled per-user authentication to AWS, individual users must configure their own AWS authentication. See Storage Config Page
  3. Locate the following setting and set it to S3:

    Default storage environment

Disable TFS

If you wish to disable all access to TFS, please do the following:

Steps:

  1. You apply this change through the Workspace Settings Page. For more information, see Platform Configuration Methods.
  2. Enable S3 as the default storage environment. See "Use s3 as Default Storage Environment."
  3. Locate the following setting and set it to Disabled:

    Trifacta File Storage


This page has no comments.