...
most jobs to transform your data are executed by default on , a managed service for executing data pipelines within the . has been designed to integrate with and to take advantage of multiple features available in the service. This section describes how to execute a job on , as well as its options. |
Project owners can choose to enable
, an in-memory running environment hosted on the . This running environment yields faster performance on small- to medium-sized jobs. For more information, see Dataprep Project Settings Page.Default
Jobs
...
- To run a job, open the flow containing the recipe whose output you wish to generate.
- Locate the recipe. Click the recipe's output object icon.
- On the right side of the screen, information about the output object is displayed. The output object defines:
- The type and location of the outputs, including filenames and method of updating.
- Profiling options
- Execution options
- For now, you can ignore the options for the output object. Click Run Job.
- In the Run Job page, you can review the job as it is currently specified.
- To run the job on , select Dataflow.
- Click Run Job.
- The job is queued with default settings for execution on .
...
Run job in shared VPC network (internal IP addresses)
D s ed |
---|
editions | gdpent,gdppro,gdppr |
---|
|
You can specify to run your job in a VPC network that is shared with multiple projects. Please configure the following settings for your Dataflow Execution Settings:
- VPC Network Mode:
Custom
- Network: Do not modify. When a Subnetwork value is specified, ignores the Network value.
Subnetwork: Specify the subnetwork using a full URL. See below.
Info |
---|
NOTE: If the Subnetwork is located within a shared VPC network, you must specify the complete URL. |
Info |
---|
NOTE: Additional subnet-level permissions may be required. |
...
For more information on subnet-level permissions, see https://cloud.google.com/vpc/docs/provisioning-shared-vpc#networkuseratsubnet.
Run job API
You can also run jobs using the REST APIs.
...
Machine scaling algorithms
D s ed |
---|
editions | gdpent,gdppro,gdppr |
---|
|
By default,
utilizes a scaling algorithm based on throughput to scale up or down the Google Compute Engine instances that are deployed to execute your job. Info |
---|
NOTE: Auto-scaling can increase the costs of job execution. If you use auto-scaling, you should specify a reasonable maximum limit. |
...