This section describes how you can configure to operate within your enterprise's virtual private cloud (VPC).

The runs in your VPC in the . No additional configuration is required.

Optionally, you can configure  jobs to be executed within your VPC. When enabled, data remains in your VPC during full execution of the job. 

NOTE: Previewing and sampling use the default network settings.

To enable in-VPC execution, the VPC network mode must be set to custom, and additional VPC properties must be provided. In-VPC job execution can be configured per-user or per-output:

 and Connectivity Jobs

By default,  and connectivity jobs execute in the . As needed, you can configure these jobs to run in your VPC.

Job TypeDescription

These jobs are transformation and quick scan sampling jobs that execute in memory. This type of job execution is suitable for small- to medium-sized jobs.

Connectivity

If your data source or publishing target is a relational or API-based source, some or all of the job occurs through the connectivity framework.

For these two job types, there are two types of configuration:

Configuration TypeDescription
Basic

Uses the GKE default namespace and default node pool.

Advanced

User-configured GKE namespace and user-specified node pool.

Details on these configuration methods are provided below. 

Limitations

The following limitations apply to this release. These limitations may change in the future:

Prerequisites

Before you begin, please verify that your VPC environment has the following:

Acquire from :

Enable

In-VPC execution must be enabled by an administrator. See Dataprep Project Settings Page.

Basic configuration

Please complete the following steps for the Basic configuration.

Google Cloud IAM Service Account

This Service Account is assigned to the nodes in the GKE node pool and is configured to have minimal privileges.

Following are variables listed in the configuration steps. They can be modified based on your requirements and supported values:

VariableDescription

trifacta-service-account

Default service account name
myprojectName of your Google project
myregionYour Google Cloud region

Please execute the following commands from the gcloud CLI: 

gcloud iam service-accounts create trifacta-service-account \
--display-name="Service Account for running Trifacta Remote jobs"

gcloud projects add-iam-policy-binding myproject \
--member "serviceAccount:trifacta-service-account@myproject.iam.gserviceaccount.com" \
--role roles/logging.logWriter


gcloud projects add-iam-policy-binding myproject \
--member "serviceAccount:trifacta-service-account@myproject.iam.gserviceaccount.com" \
--role roles/monitoring.metricWriter


gcloud projects add-iam-policy-binding myproject \
--member "serviceAccount:trifacta-service-account@myproject.iam.gserviceaccount.com" \
--role roles/monitoring.viewer


gcloud projects add-iam-policy-binding myproject \
--member "serviceAccount:trifacta-service-account@myproject.iam.gserviceaccount.com" \
--role roles/stackdriver.resourceMetadata.writer

Verification steps:

Command:

gcloud projects get-iam-policy myproject --flatten="bindings[].members" --format="table(bindings.role)" --filter="bindings.members:serviceAccount:trifacta-service-account@myproject.iam.gserviceaccount.com"

The output should look like the following:

ROLE
roles/artifactregistry.reader
roles/logging.logWriter
roles/monitoring.metricWriter
roles/monitoring.viewer
roles/stackdriver.resourceMetadata.writer

Router and NAT

The following configuration is required for Internet access to acquire assets from , if the GKE cluster has private nodes.

gcloud compute routers create myproject-myregion \
--network myproject-network \
--region=myregion

gcloud compute routers nats create myproject-myregion \
--router=myproject-myregion \
--auto-allocate-nat-external-ips \
--nat-all-subnet-ip-ranges \
--enable-logging

Verification Steps:

You can verify that the router NAT was created in the Console: https://console.cloud.google.com/net-services/nat/list.

GKE cluster 

This configuration creates the GKE cluster for use in executing  jobs. This cluster must be created in the VPC/sub-network that has access to your datasources, such as your databases and 

In the following, please replace w.x.y.z with the IP address provided to you by  for authorized control plane access.

gcloud container clusters create "trifacta-cluster" \
--project "myproject" \
--region "myregion" \
--no-enable-basic-auth \
--cluster-version "1.20.8-gke.900" \
--release-channel "None" \
--machine-type "n1-standard-16" \
--image-type "COS_CONTAINERD" \
--disk-type "pd-standard" \
--disk-size "100" \
--metadata disable-legacy-endpoints=true \
--service-account "trifacta-service-account@myproject.iam.gserviceaccount.com" \
--max-pods-per-node "110" \
--num-nodes "1" \
--logging=SYSTEM,WORKLOAD \
--monitoring=SYSTEM \
--enable-ip-alias \
--network "projects/myproject/global/networks/myproject-network" \
--subnetwork "projects/myproject/regions/myregion/subnetworks/myproject-subnet-myregion" \
--no-enable-intra-node-visibility \
--default-max-pods-per-node "110" \
--enable-autoscaling \
--min-nodes "0" \
--max-nodes "3" \
--enable-master-authorized-networks \
--master-authorized-networks w.x.y.z/32 \
--addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver \
--no-enable-autoupgrade \
--enable-autorepair \
--max-surge-upgrade 1 \
--max-unavailable-upgrade 0 \
--workload-pool "myproject.svc.id.goog" \
--enable-private-nodes \
--enable-shielded-nodes \
--shielded-secure-boot \
--node-locations "myregion-a","myregion-b","myregion-c" \
--master-ipv4-cidr=10.1.0.0/28 \
--enable-binauthz 

Verification Steps:

You can verify that the cluster was created through the Console: https://console.cloud.google.com/kubernetes/list/overview.

Switch to new cluster

Use the following command to set up configuration to connect to the new cluster:

gcloud container clusters get-credentials trifacta-cluster --region myregion --project myproject

The following commands whitelist the Cloud shell for use on the cluster:

  1. Get the IP for the shell instance:

    dig +short myip.opendns.com @resolver1.opendns.com
  2. Modify the authorized networks to include the IP. You must include the IP each time, since the IP addresses are not static.

    gcloud container clusters update mycluster \
     --enable-master-authorized-networks \
     --master-authorized-networks 34.68.114.64/28,192.77.238.35/32,34.75.7.151/32
  3. After you have acquired access, you can whitelist the following accounts and roles:

    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: ServiceAccount
    automountServiceAccountToken: false
    metadata:
      namespace: default
      name: trifacta-job-runner
    EOF
    cat <<EOF | kubectl apply -f -
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: trifacta-job-runner-role
    rules:
    - apiGroups: [""]
      resources: ["secrets"]
      verbs: ["create", "delete"]
    - apiGroups: [""]
      resources: ["pods"]
      verbs: ["list"]
    - apiGroups: [""]
      resources: ["pods/log"]
      verbs: ["get"]
    - apiGroups: ["batch"]
      resources: ["jobs"]
      verbs: ["get", "create", "delete", "watch"]
    - apiGroups: [""]
      resources: ["serviceAccounts"]
      verbs: ["list", "get"]
    EOF
    cat <<EOF | kubectl apply -f -
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: trifacta-job-runner-rb
    subjects:
    - kind: ServiceAccount
      name: trifacta-job-runner
      namespace: default
    roleRef:
      kind: Role
      name: trifacta-job-runner-role
      apiGroup: rbac.authorization.k8s.io
    EOF
    cat <<EOF | kubectl apply -f -
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: node-list-role
    rules:
    - apiGroups: [""]
      resources: ["nodes"]
      verbs: ["list"]
    EOF
    cat <<EOF | kubectl apply -f -
    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: node-list-rb
    subjects:
    - kind: ServiceAccount
      name: trifacta-job-runner
      namespace: default
    roleRef:
      kind: ClusterRole
      name: node-list-role
      apiGroup: rbac.authorization.k8s.io
    EOF
    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: ServiceAccount
    automountServiceAccountToken: false
    metadata:
      name: trifacta-pod-sa
    EOF

Node pool - diff

For basic configuration,  uses the default node pool. No additional configuration is required.

Kubernetes namespace - diff

For basic configuration,  uses the default namespace. No additional configuration is required.

Kubernetes Service Accounts - diff

VariableDescription
trifacta-job-runner

Service Account used by externally to launch jobs into the GKE cluster.

trifacta-pod-saService Account assigned to the job pod running in the GKE cluster.

Please execute the following commands:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
automountServiceAccountToken: false
metadata:
  namespace: default
  name: trifacta-job-runner
EOF
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: trifacta-job-runner-role
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["create", "delete"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["list"]
- apiGroups: [""]
  resources: ["pods/log"]
  verbs: ["get"]
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["get", "create", "delete", "watch"]
- apiGroups: [""]
  resources: ["serviceAccounts"]
  verbs: ["list", "get"]
EOF
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: trifacta-job-runner-rb
subjects:
- kind: ServiceAccount
  name: trifacta-job-runner
  namespace: default
roleRef:
  kind: Role
  name: trifacta-job-runner-role
  apiGroup: rbac.authorization.k8s.io
EOF
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-list-role
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["list"]
EOF
cat <<EOF | kubectl apply -f -
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: node-list-rb
subjects:
- kind: ServiceAccount
  name: trifacta-job-runner
  namespace: default
roleRef:
  kind: ClusterRole
  name: node-list-role
  apiGroup: rbac.authorization.k8s.io
EOF
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
automountServiceAccountToken: false
metadata:
  name: trifacta-pod-sa
EOF

Credential encryption keys

The following commands create the encryption keys for credentials:

openssl genrsa -out private_key.pem 2048


openssl pkcs8 -topk8 -inform PEM -outform DER -in private_key.pem -out private_key.der -nocrypt


openssl rsa -in private_key.pem -pubout -outform DER -out public_key.der

base64 -i public_key.der > public_key.der.base64

base64 -i private_key.der > private_key.der.base64


kubectl create secret generic trifacta-credential-encryption -n default \
--from-file=privateKey=private_key.der.base64

 configuration

After you have completed the above configuration, you must configure the  based on the commands that you have executed. 

Steps:

  1. Login to the  as a project owner.
  2. Select Admin console > VPC runtime settings.

Please complete the configuration.

Kubernetes cluster tab:

SettingCommand or Value
Master URL

Command:

gcloud container clusters describe trifacta-cluster --zone=myregion --format="value(endpoint)"

Returns:

This command returns a URL that looks similar to the following:

https://34.0.0.0
OAuth token

Command:

kubectl get secret `kubectl get sa trifacta-job-runner -o json | jq -r '.secrets[0].name'` -o json | jq -r '.data.token' | base64 -decode
Cluster CA certificate

Command:

gcloud container clusters describe trifacta-cluster --zone=myregion --format="value(masterAuth.clusterCaCertificate)"
Service account name

Value: trifacta-pod-sa

Public key (optional)

Insert the contents of: public_key.der.base64.

To acquire this value:

cat public_key.der.base64
Private key secret name (optional)Value: trifacta-credential-encryption

Photon tab:

SettingCommand or Value
Namespace

Value: default

To acquire the namespace value:

kubectl get namespace
CPU, memory - request, limits

Adjust as needed.

NOTE: CPU and memory requests and limits should be lower than the CPU and memory that can be allocated on the GKE node.

Node selector, tolerations

- diff

Values:

Node selector = ""
Node tolerations = ""

Connectivity/DataSystem tab:

SettingCommand or Value
Namespacedata-system-job-namespace
CPU, memory - request, limitsAdjust defaults, if necessary.
Node selector, tolerations
Node selector = "{\"cloud.google.com/gke-nodepool\": \"data-system-job-pool\"}"
    Node tolerations = "[{\"effect\":\"NoSchedule\",\"key\":\"jobType\",\"operator\":\"Equal\",\"value\":\"dataSystem\"}]"

If you have tested and saved your configuration, you should be able to run a  job in your VPC. See "Testing" below.

Configure Workload Identity for connectivity jobs

Google access tokens are valid for 1 hour. Jobs that are sourced or targeted from relational systems can be long running. To protect against timeouts during these jobs and to support recommended practices for security,  supports the use of Workload Identity, which is Google's recommended approach for accessing Google APIs.

NOTE: Workload Identity requires the use of Companion Service Accounts. Each user in your project must be assigned a Companion Service Account. For more information, see Google Service Account Management.

This section describes how to bind a Companion ServiceAccount to a Kubernetes ServiceAccount on the GKE cluster using Workload Identity.

For each Companion Service Account assigned to a user in :

  1. A new Kubernetes ServiceAccount must be created on the GKE cluster.

    NOTE: This step must be completed by your administrator.


  2. Using Workload Identity, the Companion ServiceAccount must be bound to the newly created Kubernetes ServiceAccount.

The following assumes that a Companion ServiceAccount named allAccess@myproject.iam.gserviceaccount.com  already exists:

// Create a new Kubernetes ServiceAccount on the GKE cluster with an annotation to bind it to the allAccess@myproject.iam.gserviceaccount.com Companion ServiceAccount.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
automountServiceAccountToken: false
metadata:

  annotations:
    iam.gke.io/gcp-service-account: allAccess@myproject.iam.gserviceaccount.com

  name: trifacta-pod-sa-allaccess
EOF



// Allow the Kubernetes ServiceAccount to impersonate the Google IAM ServiceAccount by adding an IAM policy binding between the two service accounts. This binding allows the Kubernetes ServiceAccount to act as the IAM ServiceAccount.
gcloud iam service-accounts add-iam-policy-binding \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:myproject.svc.id.goog[default/trifacta-pod-sa-allaccess]" \

allAccess@myproject.iam.gserviceaccount.com

Wait a couple of minutes for the binding to take effect.

NOTE: For relational connectivity, additional configuration is required. Search for data-system under Advanced configuration.

Advanced configuration

Please complete the following steps for the Advanced setup. These steps allow you to specify:

Google Cloud IAM Service Account - advanced - diff

This Service Account is assigned to the nodes in the GKE node pool and is configured to have minimal privileges.

Following are variables listed in the configuration steps. They can be modified based on your requirements and supported values:

VariableDescription

trifacta-service-account

Default service account name
myprojectName of your Google project
myregionYour Google Cloud region

Please execute the following commands from the gcloud CLI: 

gcloud iam service-accounts create trifacta-service-account \
--display-name="Service Account for running Trifacta Remote jobs"

gcloud projects add-iam-policy-binding myproject \
--member "serviceAccount:trifacta-service-account@myproject.iam.gserviceaccount.com" \
--role roles/logging.logWriter

gcloud projects add-iam-policy-binding myproject \
--member "serviceAccount:trifacta-service-account@myproject.iam.gserviceaccount.com" \
--role roles/monitoring.metricWriter


gcloud projects add-iam-policy-binding myproject \
--member "serviceAccount:trifacta-service-account@myproject.iam.gserviceaccount.com" \
--role roles/monitoring.viewer


gcloud projects add-iam-policy-binding myproject \
--member "serviceAccount:trifacta-service-account@myproject.iam.gserviceaccount.com" \
--role roles/stackdriver.resourceMetadata.writer

Verification steps:

Command:

gcloud projects get-iam-policy myproject --flatten="bindings[].members" --format="table(bindings.role)" --filter="bindings.members:serviceAccount:trifacta-service-account@myproject.iam.gserviceaccount.com"

The output should look like the following:

ROLE
roles/artifactregistry.reader
roles/logging.logWriter
roles/monitoring.metricWriter
roles/monitoring.viewer
roles/stackdriver.resourceMetadata.writer

Router and NAT - advanced

The following configuration is required for Internet access to acquire assets from , if the GKE cluster has private nodes.

gcloud compute routers create myproject-myregion \
--network myproject-network \
--region=myregion

gcloud compute routers nats create myproject-myregion \
--router=myproject-myregion \
--auto-allocate-nat-external-ips \
--nat-all-subnet-ip-ranges \
--enable-logging

Verification Steps:

You can verify that the router NAT was created in the Console: https://console.cloud.google.com/net-services/nat/list.

GKE cluster - advanced

This configuration creates the GKE cluster for use in executing  jobs. This cluster must be created in the VPC/sub-network that has access to your datasources, such as your databases and 

In the following, please replace w.x.y.z with the IP address provided to you by  for authorized control plane access.

gcloud container clusters create "trifacta-cluster" \
--project "myproject" \
--region "myregion" \
--no-enable-basic-auth \
--cluster-version "1.20.8-gke.900" \
--release-channel "None" \
--machine-type "n1-standard-16" \
--image-type "COS_CONTAINERD" \
--disk-type "pd-standard" \
--disk-size "100" \
--metadata disable-legacy-endpoints=true \
--service-account "trifacta-service-account@myproject.iam.gserviceaccount.com" \
--max-pods-per-node "110" \
--num-nodes "1" \
--logging=SYSTEM,WORKLOAD \
--monitoring=SYSTEM \
--enable-ip-alias \
--network "projects/myproject/global/networks/myproject-network" \
--subnetwork "projects/myproject/regions/myregion/subnetworks/myproject-subnet-myregion" \
--no-enable-intra-node-visibility \
--default-max-pods-per-node "110" \
--enable-autoscaling \
--min-nodes "0" \
--max-nodes "3" \
--enable-master-authorized-networks \
--master-authorized-networks x.y.z.w/32 \
--addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver \
--no-enable-autoupgrade \
--enable-autorepair \
--max-surge-upgrade 1 \
--max-unavailable-upgrade 0 \
--workload-pool "myproject.svc.id.goog" \
--enable-private-nodes \
--enable-shielded-nodes \
--shielded-secure-boot \
--node-locations "myregion-a","myregion-b","myregion-c" \
--master-ipv4-cidr=10.1.0.0/28 \
--enable-binauthz 

Verification Steps:

You can verify that the cluster was created through the Console: https://console.cloud.google.com/kubernetes/list/overview.

Switch to new cluster - advanced

Use the following command to switch to the new GKE cluster that you just created:

gcloud container clusters get-credentials trifacta-cluster --region myregion --project myproject

Node pool - advanced - diff

Please complete the following configuration to specify a non-default node pool. In this example, the value is photon-job-pool:

gcloud container node-pools create photon-job-pool \
--cluster trifacta-cluster \
--enable-autorepair \
--no-enable-autoupgrade \
--image-type=COS_CONTAINERD \
--machine-type=n1-standard-16 \
--max-surge-upgrade 1 \
--max-unavailable-upgrade=0 \
--node-locations=myregion-a,myregion-b,myregion-c \
--node-taints=jobType=photon:NoSchedule \
--node-version=1.20.8-gke.900 \
--num-nodes=1 \
--shielded-integrity-monitoring \
--shielded-secure-boot \
--workload-metadata=GKE_METADATA  \
--enable-autoscaling \
--max-nodes=10 \
--min-nodes=1 \
--region=myregion \
--service-account=trifacta-service-account@myproject.iam.gserviceaccount.com

You can use the following command to get the list of available node pools for your cluster:

gcloud container node-pools list --cluster trifacta-cluster --region=myregion


Kubernetes namespace - advanced - diff

Please complete the following configuration to specify a non-default namespace. In this example, the value is photon-job-namespace:

kubectl create namespace photon-job-namespace

Kubernetes Service Accounts - advanced

VariableDescription
trifacta-job-runner

Service Account used by externally to launch jobs into the GKE cluster.

trifacta-pod-saService Account assigned to the job pod running in the GKE cluster.

Please execute the following commands:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
automountServiceAccountToken: false
metadata:
  namespace: default
  name: trifacta-job-runner
EOF
cat <<EOF | kubectl apply -n photon-job-namespace -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: trifacta-job-runner-role
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["create", "delete"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["list"]
- apiGroups: [""]
  resources: ["pods/log"]
  verbs: ["get"]
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["get", "create", "delete", "watch"]
- apiGroups: [""]
  resources: ["serviceAccounts"]
  verbs: ["list", "get"]
EOF
cat <<EOF | kubectl apply -n photon-job-namespace -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: trifacta-job-runner-rb
subjects:
- kind: ServiceAccount
  name: trifacta-job-runner
  namespace: default
roleRef:
  kind: Role
  name: trifacta-job-runner-role
  apiGroup: rbac.authorization.k8s.io
EOF
cat <<EOF | kubectl apply -n photon-job-namespace -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-list-role
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["list"]
EOF
cat <<EOF | kubectl apply -n photon-job-namespace -f -
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: node-list-rb
subjects:
- kind: ServiceAccount
  name: trifacta-job-runner
  namespace: default
roleRef:
  kind: ClusterRole
  name: node-list-role
  apiGroup: rbac.authorization.k8s.io
EOF
cat <<EOF | kubectl apply -n photon-job-namespace -f -
apiVersion: v1
kind: ServiceAccount
automountServiceAccountToken: false
metadata:
  name: trifacta-pod-sa
EOF

Credential encryption keys - advanced

The following commands create the encryption keys for credentials:

openssl genrsa -out private_key.pem 2048

openssl pkcs8 -topk8 -inform PEM -outform DER -in private_key.pem -out private_key.der -nocrypt


openssl rsa -in private_key.pem -pubout -outform DER -out public_key.der

base64 -i public_key.der > public_key.der.base64

base64 -i private_key.der > private_key.der.base64


kubectl create secret generic trifacta-credential-encryption -n photon-job-namespace \
--from-file=privateKey=private_key.der.base64

Connectivity/data-system node pool - advanced

gcloud container node-pools create data-system-job-pool \
--cluster=trifacta-cluster \
--enable-autorepair \
--no-enable-autoupgrade \
--image-type=COS_CONTAINERD \
--machine-type=n1-standard-16 \
--max-surge-upgrade=1 \
--max-unavailable-upgrade=0 \
--node-locations=us-central1-a,us-central1-b,us-central1-c \
--node-taints=jobType=dataSystem:NoSchedule \
--node-version=1.22.7-gke.1300 \
--num-nodes=1 \
--shielded-integrity-monitoring \
--shielded-secure-boot \
--workload-metadata=GKE_METADATA  \
--enable-autoscaling \
--max-nodes=10 \
--min-nodes=1 \
--region=us-central1 \
--service-account=trifacta-service-account@myproject.iam.gserviceaccount.com

Connectivity/data-system - Kubernetes namespace - advanced

kubectl create namespace data-system-job-namespace

Connectivity/data-system - Kubernetes roles and rolebindings - advanced

cat <<EOF | kubectl apply -n data-system-job-namespace -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: trifacta-job-runner-role
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["create", "delete"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["list"]
- apiGroups: [""]
  resources: ["pods/log"]
  verbs: ["get"]
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["get", "create", "delete", "watch"]
- apiGroups: [""]
  resources: ["serviceAccounts"]
  verbs: ["list", "get"]
EOF

cat <<EOF | kubectl apply -n data-system-job-namespace -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: trifacta-job-runner-rb
subjects:
- kind: ServiceAccount
  name: trifacta-job-runner
  namespace: default
roleRef:
  kind: Role
  name: trifacta-job-runner-role
  apiGroup: rbac.authorization.k8s.io
EOF

cat <<EOF | kubectl apply -n data-system-job-namespace -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-list-role
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["list"]
EOF

cat <<EOF | kubectl apply -n data-system-job-namespace -f -
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: node-list-rb
subjects:
- kind: ServiceAccount
  name: trifacta-job-runner
  namespace: default
roleRef:
  kind: ClusterRole
  name: node-list-role
  apiGroup: rbac.authorization.k8s.io
EOF

cat <<EOF | kubectl apply -n data-system-job-namespace -f -
apiVersion: v1
kind: ServiceAccount
automountServiceAccountToken: false
metadata:
  name: trifacta-pod-sa
EOF

Connectivity/data-system - create secret - advanced

Create a secret to store the private key in the Connectivity/DataSystem job namespace.

kubectl create secret generic trifacta-credential-encryption -n data-system-job-namespace \
    --from-file=privateKey=private_key.der.base64

 configuration - advanced

After you have completed the above configuration, you must populate the following values in the  based on the commands that you execute below. 

Steps:

  1. Login to the  as a project owner.
  2. Select Admin console > VPC runtime settings.
  3. Please complete the following configuration.

Kubernetes cluster tab:

SettingCommand or Value
Master URL

Command:

gcloud container clusters describe trifacta-cluster --zone=myregion --format="value(endpoint)"
OAuth token

Command:

kubectl get secret `kubectl get sa trifacta-job-runner -o json | jq -r '.secrets[0].name'` -o json | jq -r '.data.token' | base64 -decode
Cluster CA certificate

Command:

gcloud container clusters describe trifacta-cluster --zone=myregion --format="value(masterAuth.clusterCaCertificate)"
Service account name - diff?

Value: trifacta-pod-sa

Public key (optional)

Insert the contents of: public_key.der.base64.

To acquire this value:

cat public_key.der.base64
Private key secret name (optional)Value: trifacta-credential-encryption

Photon tab:

SettingCommand or Value
Namespace

Value: photon-job-namespace

To acquire the namespace value:

kubectl get namespace
CPU, memory - request, limitsAdjust as needed.
Node selector, tolerations - diff

Values:

Node selector = "{\"cloud.google.com/gke-nodepool\": \"photon-job-pool\"}"
Node tolerations = "[{\"effect\":\"NoSchedule\",\"key\":\"jobType\",\"operator\":\"Equal\",\"value\":\"photon\"}]"

If you have tested and saved your configuration, you should be able to run a  job in your VPC. See "Testing" below.

Testing

You can use the following command to watch the Kubernetes clusters for job execution:

kubectl get pods -n photon-job-namespace -w

To check active pods:

kubectl get pods -n default -w

To get details on a specific pod:

kubectl describe <podId>

Then, run a job in   through the  . If the job runs successfully, then the configuration has been properly applied. See  Run Job Page .