This section applies to getting started with Trifacta® SaaS, an AWS-native platform for data wrangling. The following following product tiers are available:
- Trifacta Premium
- Trifacta Standard
Trifacta® SaaS enables you to rapidly ingest, transform, and deliver clean, actionable data across your entire enterprise. Please review the following sections on how to prepare for and set up your Trifacta SaaS workspace.
NOTE: This section applies to both the free version and the licensed version of Trifacta SaaS. For more information on the differences, see Product Limitations.
This section provides an overview of how to get started using the product.
- Administrators should complete the first section to set up the product for use.
- After set up is complete, individual users should complete the second section to get started using the product.
Having difficulties? To speak to a support representative, click the icon in the corner and submit your question.
Before you begin. If you are using your own AWS S3 buckets, you should prepare them and their access policies to ensure that Trifacta SaaS can integrate with them.
NOTE: If you do not have these AWS resources, they can be created for you. Details are below.
- Technical setup: Please share the technical setup section with your S3 administrator.
- Register. Complete the simple online workflow to license and create your Trifacta SaaS workspace.
- Workspace setup. Before you invite other users to your workspace, you should complete a few setup steps.
- Invite users. If you intend to share the workspace with other users, you can invite them from within it.
- Wrangle away!
Before You Begin
Hosted on Amazon Web Services, Trifacta SaaS is designed to natively interact with AWS datasources, so that you can rapidly transform your data investments in AWS.
Below are the AWS objects that are required for setup.
Tip: If you do not have immediate access to these assets, some can be created as part of the workflow setup.
To create these objects are part of the setup process, you must have an AWS account. For more information, see https://aws.amazon.com/.
|Valid email address||Y||To validate your registration for a new workspace, you must have a valid email address to which the product can deliver the registration email.|
|Choice: cross-account role access or key-secret access||Y|
To integrate with your existing S3 resources, you must choose a method of authentication. Choices:
An IAM (Identity and Access Management) policy is an AWS resource used to define the low-level permissions for access to a specific resource. During setup, you can use or create a new IAM policy for the product to use for either access method.
For more information, see "Create policy to grant access to S3 bucket" below.
|cross-account role access: IAM role||Y|
An IAM role contains one or more IAM policies that can be used to define the set of available AWS services and the level of access to them for a user. In this case, the user is the Trifacta application.
|key-secret access: AWS key-secret||Y||An older AWS access method, the key-secret combination is essentially a username and password combination to one or more S3 buckets.|
|S3 bucket||Y||S3 (Simplified Storage Service) is a cloud-based file storage system hosted in AWS. An S3 bucket contains your data files and their organizing folders.|
|S3 bucket: encryption||N|
For better security, your S3 bucket may be encrypted, which means that the data is stored inside of S3 in a way that is not human-readable.
NOTE: The product can optionally integrate with encrypted S3 buckets. The following S3 encryption methods are supported: sse-s3 and sse-kms.
NOTE: If your bucket is encrypted with ss3-kms, additional configuration is required. See "Update policy to accommodate SSE-KMS if necessary" below.
For more information on your bucket's encryption, please contact your S3 administrator.
|S3 bucket: storage location||N|
If needed, you can change the location where results are stored in S3.
NOTE: The product must have write permission to this location. If you are changing the location from the default, please verify with your S3 administrator that the preferred location is enabled for writing through your access method.
|Workspace name||Y||During setup, you must create a unique workspace identifier. This identifier cannot contain spaces or special characters.|
|IAM role: Account ID||N|
The account ID identifies in the trust policy that Trifacta AWS account can use your IAM role.
Tip: This identifier is provided to you during registration and setup.
|IAM role: External ID||N|
The external ID identifies in the trust policy that Trifacta SaaS can use your IAM role only on your behalf.
Tip: This identifier is provided to you during registration and setup.
The following sections should be provided to your AWS administrator for setting up access to these resources, if required.
Create policy to grant access to S3 bucket
To use your own S3 bucket(s) with Trifacta SaaS, create a policy and assign it to either the user or IAM Role selected to grant access to AWS resources. In this section, you create the policy. Later, it will be applied.
- For more information on creating policies, see https://console.aws.amazon.com/iam/home#/policies.
Below is an example policy template. You should use this template to create the policy.
NOTE: You should not simply use one of the predefined AWS policies or an existing policy you have as it will likely give access to more resources than required.
- One of the statements grants access to the public demo asset buckets.
<my_default_S3_bucket>with the name of your default S3 bucket.
- To grant access to multiple buckets within your account, you can extend the resources list to accommodate the additional buckets.
Update policy to accommodate SSE-KMS if necessary
If any accessible bucket is encrypted with SSE-KMS, another policy must be deployed. See https://docs.aws.amazon.com/kms/latest/developerguide/iam-policies.html.
Add policy for Redshift access
If you are connecting to Redshift databases through your workspace, you can enable access by creating a
GetClusterCredentials policy. This policy is additive to the the S3 access policies. All of these policies can be captured in a single IAM role.
For more information on these permissions, see Required AWS Account Permissions.
Whitelist the IP address range of the Trifacta Service, if necessary
If you are enabling any relational source, including Redshift, you must whitelist the IP address range of the Trifacta service in the relevant security groups.
NOTE: The database to which you are connecting must be available from the Trifacta service over the public Internet.
The IP address range of the Trifacta service is:
For Redshift, there are two ways to whitelist the IP range depending on if you are using EC2-VPC or EC2-Classic (not common).
- EC2-VPC (Security group): Add the IP address range to the inbound rule for the security group associated with the cluster. For more information, see https://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-authorize-cluster-access.html#rs-gsg-how-to-authorize-access-vpc-security-group.
- EC2-Classic: Add the IP address range to the inbound rule for the security group associated with the EC2 instance. For more information, see https://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-authorize-cluster-access.html#rs-gsg-how-to-authorize-access-cluster-security-group.
For details on this process with RDS in general, see https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.RDSSecurityGroups.html
For more information, please contact Trifacta Support.
Register for Trifacta SaaS
To begin the registration process, please visit https://www.trifacta.com/start-wrangling.
After you have completed registration, please login to the application. The Home page is displayed.
NOTE: You can now access online documentation through the application. From the left menu bar, select Help menu > Documentation.
Review Workspace Settings
As the first registered user, you are assigned the workspace admin role, which provides control over workspace-level settings. Before you invite users to the workspace, you should review and modify the basic configuration for the workspace. See Workspace Settings Page.
NOTE: Workspace administrators should complete the following steps to verify that the product is operational end-to-end.
To complete this test, you should locate or create a simple dataset. Your dataset should be created in the format that you wish to test. Tip: The simplest way to test is to create a two-column CSV file with at least 25 non-empty rows of data. This data can be uploaded through the application. Characteristics: Steps: Login to the application.See Login. In the application menu bar, click Library. Tip: When you login for the first time, you can immediately import a dataset to begin transforming it. If options are presented, select the defaults. See Run Job Page. Checkpoint: You have verified importing from the selected datastore and transforming a dataset. If your job was successfully executed, you have verified that the product is connected to the job running environment and can write results to the defined output location. Optionally, you may have tested profiling of job results. If all of the above tasks completed, the product is operational end-to-end.
Prepare Your Sample Dataset
To complete this test, you should locate or create a simple dataset. Your dataset should be created in the format that you wish to test.
Tip: The simplest way to test is to create a two-column CSV file with at least 25 non-empty rows of data. This data can be uploaded through the application.
Login to the application.See Login.
In the application menu bar, click Library.
Tip: When you login for the first time, you can immediately import a dataset to begin transforming it.
If options are presented, select the defaults.
See Run Job Page.
Checkpoint: You have verified importing from the selected datastore and transforming a dataset. If your job was successfully executed, you have verified that the product is connected to the job running environment and can write results to the defined output location. Optionally, you may have tested profiling of job results. If all of the above tasks completed, the product is operational end-to-end.
NOTE: First-time users of the product should access Trifacta SaaS by invitation only. Do not provide direct URLs to first-time users.
You can invite other people to join your workspace.
- When users initially join your workspace, they are assigned a non-admin role. Through the Workspace Users page, you can assign roles.
- Select User menu > Admin Console > Users. Then, click Invite Users.
- For more information, see Workspace Users Page.
- The workspace administrators must provide credentials for each workspace member account. See Workspace Users Page.
When a new workspace is created, the first user is provided a set of example flows. These flows are intended to teach by example and illustrate many recommended practices for building your own flows. For more information on example flows, see Workflow Basics.
Getting Started for Workspace Users
This section walks through the process of getting started as a new member of a Trifacta SaaS workspace.
You should have received an email like the following:
- Click the link. If you see a Missing Storage Settings error message, then you must provide your individual user storage credentials and default bucket. To do so, click the Here link.
- In your Storage Settings page, you may be required to enter your S3 credentials. After the credentials have been entered, you can begin using the product.
- Access documentation: To access the full customer documentation, from the left nav bar, select Help menu > Documentation.
The following resources can assist workspace users in getting started with wrangling.
Tip: Check out the product walkthrough available through in-app chat! This tour steps through each phase of ingesting, transforming, and generating results for your data.
This page has no comments.