Page tree

Trifacta Dataprep



Contents:

   

Contents:


By default, Dataprep by Trifacta can access data within the Google Cloud project from which the product is run. To enable access for your project to a Cloud Storage bucket  owned by a different project, you must make the bucket accessible to the service accounts in your Dataprep by Trifacta project. Then, you must enter that storage location in the Trifacta application.


NOTE: If you grant Dataprep by Trifacta access to a bucket in another project, disabling Dataprep by Trifacta does not remove these permissions. The permissions must be manually removed to fully revoke product access to buckets in other projects. For more information, see https://cloud.google.com/dataprep/docs/concepts/gcs-buckets#removing_service_account_access_to_a_bucket.

To visit your current project on the Google Cloud console, see  https://console.cloud.google.com/dataprep/ .

Project Service Accounts

In the Google Cloud Console, select IAM > Service Accounts. The following service accounts are used by the product:

Service Account Name Owner Service Account Name
Compute Engine Google, Inc. <project-number>-compute@developer.gserviceaccount.com

Dataprep by Trifacta

Trifacta

service-<project-number>@trifacta-gcloud-prod.iam.gserviceaccount.com

where:

  • <project-number> is the numeric project identifier.

Methods for Granting Access

You can provide access to remote BigQuery datasets through one of the following methods: 

NOTE: When using a named service account to access data or run jobs in other projects, each user requesting access must be granted the roles/iam.serviceAccountUser role on the service account.


NOTE: OAuth users of the product require the following roles and permissions, too.

Grant access through IAM role

To the IAM role used to access the  Cloud Storage datasets, you must add the following service accounts:

  •  for the Dataprep by Trifacta project. This service account is required for reading the data.
  • Compute Engine Service Account for the Dataprep by Trifacta project. This service account is for running your Dataprep by Trifacta job on Dataflow using the  Cloud Storage datasets.

This method of access enables all users of the Dataprep by Trifacta project to access all datasets governed by the IAM role. For more information, see https://console.cloud.google.com/iam-admin/roles.

Grant service account access to a bucket

Use Google Cloud SDK gsutil  commands to grant your project's service accounts ownership (read/write permission) to both the bucket and its contents. For more information on gsutil, see https://cloud.google.com/storage/docs/gsutil_install#sdk-install.

NOTE: When using a named service account to access data or run jobs in other projects, each user requesting access must be granted the roles/iam.serviceAccountUser role on the service account.

To grant your project's service accounts access to both current and new objects in a  Cloud Storage bucket in another project, run both sets of the following commands.

Grant access to new bucket objects

To grant your project's service accounts access to new objects created in a  Cloud Storage bucket in another project, use the following gsutil defacl commands in your shell or terminal window:

$ gsutil defacl ch -u \
    <project-number>-compute@developer.gserviceaccount.com:OWNER \
    gs://<bucket-name>
$ gsutil defacl ch -u \
    service-<project-number>@trifacta-gcloud-prod.iam.gserviceaccount.com:OWNER \
    gs://<bucket-name>

where:

  • <project-number> is the numeric identifier for your project.
  • <bucket-name> is the name of the bucket to which you wish to grant access.

Grant access to bucket and existing objects

To grant your project's service accounts access to a  Cloud Storage bucket and its current contents in another project, use the following gsutil defacl commands in your shell or terminal window:

$ gsutil acl ch -u \
    <project-number>-compute@developer.gserviceaccount.com:OWNER \
    gs://<bucket>
$ gsutil -m acl ch -r -u \
    <project-number>-compute@developer.gserviceaccount.com:OWNER \
    gs://<bucket>
$ gsutil acl ch -u \
    service-<project-number>@trifacta-gcloud-prod.iam.gserviceaccount.com:OWNER \
    gs://<bucket>
$ gsutil -m acl ch -r -u \
    service-<project-number>@trifacta-gcloud-prod.iam.gserviceaccount.com:OWNER \
    gs://<bucket>

where:

  • <project-number> is the numeric identifier for your project.
  • <bucket-name> is the name of the bucket to which you wish to grant access.

Tip: The -m option runs the command in parallel for quicker processing. The -r option runs the command recursively on resources within the bucket.

Use bucket in Trifacta application

For import

Steps:

  1. Login to the Trifacta application.
  2. In the left nav bar, click Library.
  3. Click Import Data.
  4. Click the GCS icon in the left nav bar. 
  5. Under Choose a file or folder, click the Pencil icon. 
  6. Enter the URL of the bucket:

    gs://<bucket>
  7. Navigate to select the datasets to import.

For publishing

Steps:

  1. In Flow View, create or select the output object that you wish to use for publishing to the GCS bucket. 
  2. In the right context panel, click Edit for either manual or scheduled destinations.
  3. Add or edit a publishing action. 
  4. Under Choose a file or folder, click the Pencil icon. 
  5. Enter the URL of the bucket:

    gs://<bucket>
  6. Navigate to specify the location in the bucket where you wish to publish the output.

Remove service account access to a bucket

If you have granted service account access to a bucket, you can run the following Google Cloud SDK gsutil acl commands to remove your project's service accounts ownership (read/write permission) to the bucket and its contents.

$ gsutil defacl ch -d \
    <project-number>-compute@developer.gserviceaccount.com:OWNER \
    gs://<bucket>
$ gsutil defacl ch -d \
    service-<project-number>@trifacta-gcloud-prod.iam.gserviceaccount.com:OWNER \
    gs://<bucket>
$ gsutil acl ch -d \
    <project-number>-compute@developer.gserviceaccount.com \
    gs://<bucket>
$ gsutil -m acl ch -r -d \
    <project-number>-compute@developer.gserviceaccount.com \
    gs://<bucket>
$ gsutil acl ch -d \
    service-<project-number>@trifacta-gcloud-prod.iam.gserviceaccount.com \
    gs://<bucket>
gsutil -m acl ch -r -d \
    service-<project-number>@trifacta-gcloud-prod.iam.gserviceaccount.com \
    gs://<bucket>

where:

  • <project-number> is the numeric identifier for your project.
  • <bucket-name> is the name of the bucket to which you wish to grant access.

Tip: The -m option runs the command in parallel for quicker processing. The -r option runs the command recursively on resources within the bucket.

This page has no comments.