On January 27, 2021, Google is changing the required permissions for attaching IAM roles to service accounts. If you are using IAM roles for your Google service accounts, please see Changes to User Management.
- Cloud Dataprep Standard by TRIFACTA® INC.
- Cloud Dataprep Premium by TRIFACTA INC.
Contents:
Cloud Dataprep by TRIFACTA® INC. enables you to rapidly transform disparate datasets of any size into usable data for the entire enterprise. Ingest, explore, and transform your data through a leading-edge interface, reducing the time to prepare your data from weeks to minutes. Cloud Dataprep by TRIFACTA INC. is integrated with the Google Cloud Platform and operated by partner Trifacta.
Applicable Product Editions
These setup instructions apply to the following editions of the product:
NOTE: These product editions are licensed through the Google Marketplace from Trifacta. For more information on licensing or upgrading from Cloud Dataprep by TRIFACTA INC., please see the Google Marketplace listing.
Cloud Dataprep Premium by TRIFACTA INC.
NOTE: If you are purchasing Cloud Dataprep Premium by TRIFACTA INC., you must contact Trifacta Support before you purchase the product.
-
Cloud Dataprep Standard by TRIFACTA INC.
- Cloud Dataprep Legacy by TRIFACTA INC.
NOTE: If you are an existing Cloud Dataprep by TRIFACTA INC. customer, you can use the Marketplace to upgrade to one of the supported Marketplace editions or to enable your current product edition for a new project. You can also choose to continue using Cloud Dataprep by TRIFACTA INC..
For more information, see Product Editions.
For more information on available plans, see https://www.trifacta.com/pricing/cloud-dataprep/.
Support packages
For Cloud Dataprep Premium by TRIFACTA INC. and Cloud Dataprep Standard by TRIFACTA INC., Trifacta offers a range of support packages. For more information, please contact Trifacta Support.
Pre-requisites
Before you begin, please review the following pre-requisites.
NOTE: The name of the service account used by the product is provided by Google and cannot be modified.
NOTE: If domain restricted sharing has been enabled as a policy in your enterprise, you must add a trust policy for the Trifacta GSuite domain. If you do not have the ID for this domain, it can be provided by Trifacta Support.
Set up a project
To use either product edition, you must have the following already set up in the Google Cloud Platform.
NOTE: If you are upgrading from Cloud Dataprep by TRIFACTA INC., you should already have these services enabled.
Create or set up a Google Cloud project. In the Cloud Console, on the project selector page, select or create a Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.
Enable billing on that project. Please verify that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.
- Enable services: In your project, enable the following services:
- Cloud Dataflow
- BigQuery
- Cloud Storage APIs. See Enable the APIs.
Set up your storage bucket
On Google Cloud Storage, you must have a bucket set up for use with your project.
- In the Cloud Console, navigate to the Cloud Storage Browser page. See https://console.cloud.google.com/storage/browser.
- Click Create bucket.
- In the Create bucket dialog, specify the following attributes:
- A unique bucket name. For more information on bucket name requirements, see https://cloud.google.com/storage/docs/bucket-naming#requirements.
- A storage class. See https://cloud.google.com/storage/docs/storage-classes.
- A location where bucket data will be stored.
- Click Create.
Set up your staging bucket
By default, when you enable Cloud Dataprep by TRIFACTA INC. in a project, a Google Cloud Storage staging bucket for Cloud Dataflow use is automatically created for you in a U.S. region. This staging bucket is used for staging assets for use on Cloud Dataflow jobs and is required for use with the product. If you do have permissions to create a storage bucket in the U.S, you do not need to create a storage bucket for staging and can skip to the next section.
NOTE: If you do not have permissions to create a Google Cloud Storage bucket in the U.S., you must your own staging bucket before enabling
Cloud Dataprep by TRIFACTA INC. in your project. This name of this bucket must begin with the following text string: dataprep-staging-
followed by an identifying value.
A bucket can be created from:
- Google Console: https://cloud.google.com/storage/docs/creating-buckets
- Google CLI: https://cloud.google.com/storage/docs/gsutil
The staging bucket can be changed:
- During enablement of the product in a project. You can select a different staging bucket as needed.
- After the product has been enabled, individual users can configure the bucket to use for staging of their assets. See User Profile Page.
Premium-only requirements
Whitelist the IP address range of the Trifacta Service
If you are connecting to relational sources, you must whitelist the IP address range of the Trifacta service for your database instances. The IP address range of the Trifacta service are the following:
NOTE: On the database server for each relational source type (Oracle, SQL Server, etc.), you must whitelist these IP addresses.
34.68.114.64/28
NOTE: Relational datasources must be available on a public IP address that is accessible from the deployment of Cloud Dataprep Premium by TRIFACTA INC..
Tip: To verify that you have whitelisted the IP address range appropriately, you can create a connection of the relational connection type from inside the Trifacta application. This step is described later.
For more information, please contact Trifacta Support.
Purchase and enable through the Google Marketplace
After you have completed the above steps, please proceed through the Google Marketplace to complete your purchase. Your purchase covers:
- Basic entitlement
- Licensing for each Google Cloud projects
NOTE: Changes to your billing information for a project have been known to cause forced cancellation of Cloud Dataprep by TRIFACTA INC. in a project. If possible, please confirm that the billing information is properly set for the project before you enable the product on it. If you encounter a forced cancellation, please contact Google Support. For more information, see Enable or Disable Dataprep.
For more information, see https://console.cloud.google.com/marketplace/details/trifacta/cloud-dataprep-editions.
Setup
After the product has been licensed for your project, please complete the following steps for your account.
Required additional permissions for Cloud Dataprep Premium by TRIFACTA INC.
Cloud Dataprep Premium by TRIFACTA INC. requires special permissions to use the project. For more information, see Create IAM Role for Dataprep.
Enable in the project
NOTE: Cloud Dataprep by TRIFACTA INC. must be enabled in individual projects by the project owner.
- In the Google Cloud Console, select the project in which you wish to enable Cloud Dataprep by TRIFACTA INC..
- Open the product. See https://console.cloud.google.com/dataprep.
- As the project owner, you must enable access to project data for Google and Trifacta.
Login
Each user of the project must do the following:
- In the Google Cloud Console, select the project in which you wish to enable Cloud Dataprep by TRIFACTA INC..
- Open the product. See https://console.cloud.google.com/dataprep.
- Accept the terms of service.
- Select a Google Cloud Storage bucket to use with the product. For more information, see Enable or Disable Dataprep.
- The Trifacta application is displayed.
- The first time you login, you can immediately upload a dataset and begin transforming it. For more information, see Import Basics.
- On subsequent logins, the Home page is displayed:
Figure: Home page
Project settings
The project owner should review the settings for your project. See Project Settings Page.
Set up directories
Each user must configure the directories on Google Cloud Storage for use with the product. You can change the directories that are used for uploads, job runs, and temp storage.
- In the left nav bar, select the User icon.
- In the User menu, select Preferences.
- The User Profile page is displayed.
- As needed, you can change the Upload, Job Run, and Temp directories in your bucket. To save your changes, click Done.
For more information, see User Profile Page.
Test
If you have completed the above steps, you should verify operations.
Verify operations
Before inviting other users, you should run a simple job through the product.
To complete this test, you should locate or create a simple dataset. Your dataset should be created in the format that you wish to test. Tip: The simplest way to test is to create a two-column CSV file with at least 25 non-empty rows of data. This data can be uploaded through the application. Characteristics: Steps: Login to the application.For
Cloud Dataprep by TRIFACTA INC. editions, your login is your gmail address. In the application menu bar, click Library. Tip: When you login for the first time, you can immediately import a dataset to begin transforming it. If options are presented, select the defaults. See Run Job Page. Checkpoint: You have verified importing from the selected datastore and transforming a dataset. If your job was successfully executed, you have verified that the product is connected to the job running environment and can write results to the defined output location. Optionally, you may have tested profiling of job results. If all of the above tasks completed, the product is operational end-to-end.Prepare Your Sample Dataset
Verification Steps
Verify IP address whitelisting
If you have whitelisted the Trifacta service IP addresses for your database server, you can create a connection to the database from inside the Trifacta application. If you are able to successfully read data into the application from your database, then the whitelist has been specified correctly.
NOTE: The database to which you are connecting must be available from the Trifacta service over the public Internet.
For more information, see Connection Types.
Invite Users
You can invite other people to join your project at this time.
NOTE: First-time users of the product should access Cloud Dataprep by TRIFACTA INC. by invitation only. Do not provide direct URLs to first-time users.
For more information, see https://cloud.google.com/iam/docs/quickstart.
Example Flows
When a new workspace is created, the first user is provided a set of example flows. These flows are intended to teach by example and illustrate many recommended practices for building your own flows. For more information on example flows, see Workflow Basics.
Resources
The following resources can assist users in getting started with wrangling.
Tip: Check out the product walkthrough available through in-app chat! This tour steps through each phase of ingesting, transforming, and generating results for your data.
- For a quick start of Cloud Dataprep by TRIFACTA INC. products, see Quickstart for Dataprep.
- Check out the Trifacta Community: https://community.trifacta.com
- Try the free Wrangler certification course. See https://community.trifacta.com/s/certification.
- For a basic summary of each step of the wrangling process, see Workflow Basics.
Access documentation: To access the full customer documentation, from the left nav bar, select Help menu > Documentation.
Additional Setup
Depending on your environment, the following additional configuration steps may be required.
- To access BigQuery datasets in other projects, see Access Cross-Project BigQuery Datasets.
- To access Google Cloud Storage datasets in other projects, see Access Cross-Project Cloud Storage Buckets.
This page has no comments.