Machine Learning in GCP

Follow this guide to deploy the Machine Learning module for Google Cloud Platform (GCP) private data processing.

Prerequisite

Before you deploy the Machine Learning module, you must complete these steps on the Set Up GCP Project and VPC for Private Data page...

Configured a VPC dedicated to AACAAC as mentioned in the Configure Virtual Private Network section.
Service account and base IAM roles attached to the service account as mentioned in the Configure IAM section.
Successfully triggered private data processing provisioning as mentioned in the Trigger Private Data Handling Provisioning section.

Project Setup

Step 1: Configure IAM

Step 1a: IAM Binding to the Service Account

Assign these additional roles to the aac-automation-sa service account that you created during Set Up GCP Project and VPC for Private Data:

Compute Load Balancer Admin: roles/compute.loadBalancerAdmin
Compute Instance Admin (v1): roles/compute.instanceAdmin.v1
Compute Storage Admin: roles/compute.storageAdmin
Kubernetes Engine Cluster Admin: roles/container.clusterAdmin
Storage Admin: roles/storage.admin
Cloud Memorystore Redis Admin: roles/redis.admin

Step 2: Configure Subnet

Nota

Designer Cloud shares a subnet configuration with Machine Learning, Auto Insights, and App Builder. If you are deploying more than one of those applications, you only need to configure the subnets once.

Machine Learning in a private data processing environment requires 3 subnets. You created the aac-private subnet earlier when creating the VPC. You do not need to create it again, but it is included here for completeness.

aac-gke-node (required): The GKE cluster uses this subnet to execute Alteryx software jobs (connectivity, conversion, processing, publishing).
aac-public (required): This group doesn’t run any services, but the gke_node group uses it for egress out of the cluster.
aac-private (required): This group runs services private to the PDP.

Step 2a: Create Subnets in the VPC

Configure subnets in the aac-vpc VPC.

Create subnets following the example below. You can adjust the subnet size and secondary subnet size to match your network architecture.

The address spaces are designed to accommodate a fully scaled-out data processing environment. You can choose a smaller address space if required, but you could run into scaling issues under heavy processing loads.

Importante

The Subnet Name is not a flexible field, it must match the table below.

You may select any region from the Supported Regions list. However, you must use the same region for the Subnet Region now and when you reach the Trigger Provisioning step later.

Subnet Name	Subnet	Secondary Subnet Name	Secondary Subnet Size	Notes
aac-gke-node	10.0.0.0/22	aac-gke-pod	10.4.0.0/14	GKE cluster, GKE pod, and GKE service subnets.
		aac-gke-service	10.64.0.0/20
aac-public	10.10.1.0/25	N/A	N/A	Public egress.

Step 2b: Subnet Route Table

Create the route table for your subnets.

Importante

You must configure the Vnet with a network connection to the internet in your subscription.

Nota

This route table is an example.

Address Prefix	Next Hop Type
/22 CIDR Block (aac-gke-node)	aac-vpc
/24 CIDR Block (aac-private)	aac-vpc
/25 CIDR Block (aac-public)	aac-vpc
0.0.0.0/0	<gateway_ID>

Nota

Your <gateway id> can be either a NAT gateway or an internet gateway, depending on your network architecture.

Step 3: Update the IAM Role for the Kubernetes Service Account

Once Private Data Processing is successfully set up, a Kubernetes service account called credential-pod-sa is created. This account allows the Kubernetes credential service to access private data credentials stored in the key vault.

Nota

Replace <project number> and <project id> with project’s project number and project id.

Go to Key Management and select key ring with key created in the Step 5: Create Key Ring and Key.
Select PERMISSIONS, then select GRANT ACCESS.

In the New Principal field, enter:

principal://iam.googleapis.com/projects/<project-number>/locations/global/workloadIdentityPools/<project-id>.svc.id.goog/subject/ns/credential/sa/credential-pod-sa

Provide Cloud KMS CryptoKey Encrypter/Decrypter and Secret Manager Admin roles.
Select Save.

Private Data Processing

Cuidado

Modificar ou remover quaisquer recursos de nuvem pública provisionados pelo AAC depois que o tratamento de dados privados for configurado poderá causar inconsistências. Essas inconsistências podem levar a erros durante a execução do trabalho ou ao desprovisionamento da configuração do tratamento de dados privados.

Step 1: Trigger Machine Learning Deployment

Data processing provisioning triggers from the Admin Console inside AACAAC. You need Workspace Admin privileges within a workspace in order to see it.

From the AACAAC landing page, select the Profile menu and then select Workspace Admin.
From the Admin Console, select Private Data Handling and then select Processing.
Select the Machine Learning checkbox and then select Update.

Selecting Update triggers the deployment of the cluster and resources in the GCP project. This runs a set of validation checks to verify the correct configuration of the GCP project.

Nota

The provisioning process takes approximately 35–40 minutes to complete.

After the provisioning completes, you can view the created resources (for example, VM instances and node groups) through the GCP console. It is very important that you don't modify them on your own. Manual changes might cause issues with the function of the private data processing environment.

Machine Learning in GCP

Prerequisite

Project Setup

Step 1: Configure IAM

Step 1a: IAM Binding to the Service Account

Step 2: Configure Subnet

Step 2a: Create Subnets in the VPC

Step 2b: Subnet Route Table

Step 3: Update the IAM Role for the Kubernetes Service Account

Private Data Processing

Step 1: Trigger Machine Learning Deployment

Resultados da procura