Skip to main content

Dataproc Engine

Dataproc is a distributed Spark engine that can run your Designer Cloud workflows if you’re workspace is set up with GCS as Private Data Storage. The benefits of the Dataproc engine include…

  • Dataproc is best for larger datasets and complex workflows when compared to the AMP engine.

  • Your data doesn’t leave your Google Cloud Project (GCP) environment.

  • You might reduce operating costs by running the Dataproc engine in your GCP environment.

  • You have more control over the Dataproc engine configuration when compared to the AMP engine.

Important

A Workspace Admin must configure and enable the Dataproc engine in your workspace. For more information, go to the Dataproc Engine Setup Guide.

How to Use the Dataproc Engine

  1. Open your Designer Cloud workflow.

  2. Select Dataproc from the engine dropdown next to Run Job.

    Note

    The engine dropdown is greyed out until you add and configure an Output Data tool to your workflow.

  3. Select Run Job.