Private Data Processing
Private data processing involves running Alteryx Analytics Cloud (AAC) on a data processing cluster inside of your AWS account and VPC. This combination of your infrastructure, together with AWS resources and software managed by Alteryx, is commonly referred to as a private data plane.
This page focuses on the private data processing cluster itself. We first describe what’s going on inside the cluster, then walk through the setup.
AWS services
Private Data Handling utilizes a number of AWS services inside the customer VPC to handle data processing tasks. These are the services used:
Service | Usage |
---|---|
S3 | Base storage layer. |
EC2 | Compute resources required to run Alteryx Analytics Cloudservices. |
EKS | Runs the EC2 instances for platform services and jobs in the data plane. |
Secrets Manager | Storage of infrastructure secrets. |
IAM Roles | Provide permissions needed by the Alteryx Analytics Cloudto manage the necessary AWS resources. |
IAM Policies | Permissions underlying the IAM roles. |
VPC and Subnets | Define networking paths between different services. |
Supported Regions
Private data handling is currently available in a variety of regions. In order to provide private data handling in a region, the region must:
Support EMR serverless (available AWS regions).
Have 3+ availability zones.
Support the specific EKS node types needed.
Provide EKS 1.24.
In addition to these regional requirements, Alteryx also needs to replicate the container images to local repositories to improve design-time and runtime performance.
These are the regions where private data handling is available:
Cloud Provider | Region Code | Region Name |
AWS | ap-east-1 | Asia Pacific (Hong Kong) |
AWS | ap-northeast-1 | Asia Pacific (Tokyo) |
AWS | ap-northeast-2 | Asia Pacific (Seoul) |
AWS | ap-south-1 | Asia Pacific (Mumbai) |
AWS | ap-southeast-1 | Asia Pacific (Singapore) |
AWS | ap-southeast-2 | Asia Pacific (Sydney) |
AWS | ca-central-1 | Canada (Central) |
AWS | eu-north-1 | Europe (Stockholm) |
AWS | eu-west-1 | Europe (Ireland) |
AWS | eu-west-2 | Europe (London) |
AWS | eu-west-3 | Europe (Paris) |
AWS | eu-central-1 | Europe (Frankfurt) |
AWS | sa-east-1 | South America (São Paulo) |
AWS | us-east-1 | US East (N. Virginia) |
AWS | us-east-2 | US East (Ohio) |
AWS | us-west-2 | US West (Oregon) |
Software
Alteryx Analytics Cloud runs a number of jobs and services inside the private data plane.
Kubernetes On-demand Jobs
For Kubernetes on-demand jobs, Alteryx Analytics Cloud retrieves a container image (from cache or from a central store) and deploys it within an ephemeral pod that lasts for the duration of the job. All executables are in Java or Python.
conversion-jobs: Convert datasets from one format to another as needed within a workflow.
connectivity-jobs: Connect to external data systems at runtime.
photon-jobs: Photon is an in-memory prep and blend engine at runtime for smaller dataset sizes.
file-writer-jobs: Write processed data to the output destination specified within the workflow.
automl-jobs: In-memory jobs for Machine Learning used at runtime.
Kubernetes Long-running Services
data-service: Connects to external data systems at design-time via the JDBC API. Alteryx developed this service. Snyk scans the image for vulnerabilities.
teleport-agent: Sets up a secure way for Alteryx SRE to connect to the cluster for troubleshooting. Alteryx Analytics Cloud pulls the helm chart from the https://charts.releases.teleport.dev repository. Alteryx doesn't scan this third-party image.
datadog-agent: Collects logs and metrics from the cluster. Alteryx Analytics Cloud pulls the helm chart from the https://helm.datadoghq.com repository. Alteryx doesn't scan this third-party image.
keda: Auto-scaling of long-running services based on custom metrics with kafka support. Alteryx doesn't scan this third-party image.
external-secrets: Import/export between AWS Secret Manager secrets to and from Kubernetes Secrets Store. Alteryx doesn't scan this third-party image.
cluster-autoscaler: Scale EKS nodes based on pod demand. Alteryx doesn't scan this third-party image.
metrics-server: Allow EKS to use the metrics API. Alteryx doesn't scan this third-party image.
kubernetes-reflector: Replication of the dockerConfigJson secret across all namespaces. Alteryx doesn't scan this third-party image.
VM Long-running Services
Cloud execution for desktop (Optional): Cloud-execution-host container service that listens on a message bus for YZXP files uploaded from Designer Desktop to be processed in the data plane. Alteryx developed this service. Synk scans the image for vulnerabilities.
Provisioning Pipeline
Provisioning a private data plane consists of 2 primary steps:
Creating Cloud Resources
Deploying Software
Private data planes use Infrastructure as Code (IaC). Alteryx Analytics Cloud uses Terraform Cloud to manage this. Terraform is an IaC tool that lets you define and manage infrastructure resources through human-readable configuration files. Terraform Cloud is a SaaS product provided by Hashicorp. To create and manage private data handling resources, Alteryx Analytics Cloud uses a set of terraform files, Terraform Cloud APIs, and private Terraform Cloud agents running on Alteryx infrastructure.
When you enable and provision private data handling through the Cloud Portal, Alteryx Analytics Cloud creates these resources in the AWS infrastructure:
AWS Services | Purpose of Use | Size/Type | Desired Size [Min–Max] |
---|---|---|---|
S3 | Storage to store logs and staged temporary files. | < 50 TB |
|
EKS Cluster | Alteryx in-memory processing engine. | Photon NodeGroup m6i.4xlarge | [1–30] |
Convert datasets from one format to another. | Convert NodeGroup m6i.4xlarge | [1–30] | |
Connect to data sources. | data-system NodeGroup m6i.4xlarge | [1–30] | |
Publish job outputs to their destination. | file-system NodeGroup m6i.xlarge | [1–30] | |
Execute ML jobs. | AutoML NodeGroup m6i.4xlarge | [1–30] | |
Additional common tooling (for example, datadog and teleport) deployed. | t3a.medium | [3–8] | |
VPC | Dedicated VPC to deploy EKS, EC2, and EMR. | ||
IAM Roles and Policies | Least privileged permissions to run the software. | ||
Secret Manager | Store infrastructure encryption keys. | ≈3-6 secrets |
You can find container images for on-demand jobs in a private container image repository that the private data plane has access to.
Alteryx deploys and maintains long-running services in your EKS cluster using Argo CD. Argo CD is a declarative, GitOps continuous delivery tool for kubernetes.
Setup Steps
Data plane provisioning triggers from the Admin Console inside Alteryx Analytics Cloud. You need Admin privileges within a workspace in order to see it.
From the Alteryx Analytics Cloud homepage, select the profile icon. Select Admin Console from the menu.
From the left navigation panel, select Private Data Handling.
Caution
Modifying or removing any of the AAC-provisioned public cloud resources once Private Data Handling has been provisioned will lead to an inconsistent state. This inconsistency will trigger errors during the job execution or deprovisioning of the Private Data Handling setup.
Make sure that Private Data Storage shows “Successfully Configured” before proceeding. If the status is “Not Configured,” go to Private Data Storage first, then return to this step.
![]() |
Under the Private Data Processing section, there are 5 fields to fill out. These values come from completing the steps in Setup AWS Account and VPC.

Select Create to trigger the deployment of the cluster and resources in the AWS account. This runs a set of validation checks to verify the correct configuration of the AWS account. If there are incorrectly configured permissions, or the creation or tagging of the VPC resources is incorrect, you’ll receive an error message with a description that should point you in the right direction.
Once the initial validation checks complete, provisioning will commence. A message box on the screen periodically refreshes with status updates.
Note
The provisioning process takes approximately 35–40 minutes to complete.
Note
This step is only necessary if you used a cross-account role for permissions when you configured private data storage. If you used an access key for that step, you can skip this one.
Note
You must wait for the successful completion of Step 1 before you proceed with this step.
If your private data storage uses a cross-account role, then in order for your new private data plane to be able to read/write from your private data storage, you need to update that role to append a trust relationship with your new kubernetes cluster role as follows:
{ "Sid": "", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<accountid>:role/aac-<xxxx-xxxxxxxxxxxx>-cluster-role" }, "Action": "sts:AssumeRole" }
Note
Replace AWS Principal with the ARN of the IAM role created by the private data handling provisioning process.
<accountid>
: AWS account number where private data plane handling has been provisioned.
<xxxx-xxxxxxxxxxxx>
: Last 2 segments of Private Data Processing Environment ID. You can locate this ID in the Admin UI after the private data plane has been successfully provisioned.
Example Scenario:
Account ID: 123456789012
Private Data Processing Environment ID: b2a65fbd-95dc-490a-b69b-a1dc92df224e
Role ARN: arn:aws:iam::123456789012:role/aac-b69b-a1dc92df224e-cluster-role
For more information, go to https://docs.aws.amazon.com/directoryservice/latest/admin-guide/edit_trust.html.
Configure EMR serverless if you are using Spark/EMR processing.
Enable EMR
Log in to Alteryx Analytics Cloud.
From the Profile menu, select Admin Console.
From the leftmost navigation panel, select Private Data Handling.
Select Enable EMR and then select Update.
Update Custom Role Created for S3 Connection
Append the custom policy and custom role from Step 2 with these permissions and trust relationships for EMR serverless:
Append Custom Policy Document
{ "Version": "2012-10-17", "Statement": [ { "Sid": "EMRServerlessAccess", "Effect": "Allow", "Action": [ "emr-serverless:CreateApplication", "emr-serverless:UpdateApplication", "emr-serverless:DeleteApplication", "emr-serverless:ListApplications", "emr-serverless:GetApplication", "emr-serverless:StartApplication", "emr-serverless:StopApplication", "emr-serverless:StartJobRun", "emr-serverless:CancelJobRun", "emr-serverless:ListJobRuns", "emr-serverless:GetJobRun" ], "Resource": "*" }, { "Sid": "AllowNetworkInterfaceCreationViaEMRServerless", "Effect": "Allow", "Action": "ec2:CreateNetworkInterface", "Resource": [ "arn:aws:ec2:*:*:network-interface/*", "arn:aws:ec2:*:*:security-group/*", "arn:aws:ec2:*:*:subnet/*" ], "Condition": { "StringEquals": { "aws:CalledViaLast": "ops.emr-serverless.amazonaws.com" } } }, { "Sid":"AllowEMRServerlessServiceLinkedRoleCreation", "Effect":"Allow", "Action":"iam:CreateServiceLinkedRole", "Resource":"arn:aws:iam::<accountid>:role/aws-service-role/ops.emr-serverless.amazonaws.com/AWSServiceRoleForAmazonEMRServerless" }, { "Sid": "AllowPassingRuntimeRole", "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::<accountid>:role/aac-<xxxx-xxxxxxxxxxxx>-emr-serverless-spark-execution", "Condition": { "StringLike": { "iam:PassedToService": "emr-serverless.amazonaws.com" } } }, { "Sid": "S3ResourceBucketAccess", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:ListBucket", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::aac-<xxxx-xxxxxxxxxxxx>-emr-logs", "arn:aws:s3:::aac-<xxxx-xxxxxxxxxxxx>-emr-logs/*" ] } ] }
Append Custom Role's Trust Relationship
{ "Sid": "", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<accountid>:role/aac-<xxxx-xxxxxxxxxxxx>-emr-serverless-spark-execution" }, "Action": "sts:AssumeRole" }, { "Sid": "", "Effect": "Allow", "Principal": { "Service": "emr-serverless.amazonaws.com" }, "Action": "sts:AssumeRole" }
Note
When you delete Private Data Handling, AWS replaces the trust relationship of aac-<xxxx-xxxxxxxxxxxx>-cluster-role
ARN with an access key. You must also delete the trust relationship from the UI.
Note
Replace AWS Principal with the ARN of the IAM role created by the private data handling provisioning process.
<accountid>
: AWS account number where private data plane handling has been provisioned.
<xxxx-xxxxxxxxxxxx>
: Last 2 segments of Private Data Processing Environment ID. You can locate this ID in the Admin UI after the private data plane has been successfully provisioned.
Example Scenario:
Account ID: 123456789012
Private Data Processing Environment ID: b2a65fbd-95dc-490a-b69b-a1dc92df224e
Role ARN: arn:aws:iam::123456789012:role/aac-b69b-a1dc92df224e-emr-serverless-spark-execution
S3 ARN: arn:aws:s3:::aac-aac-b69b-a1dc92df224e-emr-logs
Step 4: Cloud Execution for Desktop (Optional)
Select the Cloud Execution for Desktop option to run Designer Desktop workflows in the cloud. Go to Enable Cloud Execution for Desktop for more information on how to enable this feature.
Common Issues
When provisioning your private data plane, these are some common trouble spots that we see. These are organized based on when you’d run across them.
When: Occurs when performing the initial validation before kicking off the private data plane provisioning pipeline.
Examples:
Error insufficient subnets tagged AACSubnet with value eks_node, 3 required
Causes: Resources (IAM accounts, policies, VPCs, subnets) not tagged correctly.
Fix: Respond to each error message and address each based on the error message and/or reread the AWS Account and VPC setup instructions to make sure you've correctly tagged all resources.
On the Private Data Handling page in Alteryx Analytics Cloud, you’ll see a note that provisioning failed, but without a good error message for what went wrong.
![]() |
In the AWS admin console, you’ll see a NodeCreationFailure issue type with a description of Instances failed to join the kubernetes cluster. This typically indicates that there’s no route that allows the EKS subnets egress out to the internet. The new EKS nodes need access to an AWS EC2 Service Endpoint to attach to the EKS Cluster.
![]() |
When: Occurs several minutes after the provisioning process has started.
Potential causes:
Firewall rules preventing egress from the cluster.
Failed DNS resolution within the cluster.
No internet gateway between the VPC and the internet.
Overlapping subnets.
NAT gateway configured as private instead of public.
Fix: You need investigate and test the networking setup. AWS provides some suggested troubleshooting steps here.
Your job will fail and you’ll see a red “x” on the wrangle step of the workflow (rather than a green check mark).
When: Occurs when running a workflow.
Cause: The workflow can't access the private data store. This is usually because private data storage is using a cross-account role and the kubernetes cluster can't assume that role to access the store.
Fix: Update the trust relationship on your private data storage bucket (Step 2 above on this page).