Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

D toc

Excerpt

Review and Project users can modify the settings related to your how your project-specific jobs are executed on

true
d-s-platform
r
dataflow
 project.

...

(read only) Your current project.

Use the Project menu to select a different project. For more information, see Projects Menu.

...

To disable

D s product
for this project, click the link.

Info

NOTE: To remove a user and his or her assets from a project, please contact

D s support
.

...

Info

NOTE: These settings apply to individual users per project. To apply changes across all of your projects, you must modify these settings within your preferences for each individual project.

Dataflow Execution Settings

Info

NOTE: Changes made to your execution settings set at the project level do not affect any overrides that have been previously applied at the individual job level. Job-level overrides remain as configured.


Tip

Tip: For more information on how the following settings affect your project's jobs, see Dataflow Execution Settings.

SettingDescription
Region

Regional endpoint

A region is a specific geographical location where you can run your resources.

Zone

A sub-section of region, a zone contains specific resources.

Select Auto Zone to allow the platform to choose the zone for you.

Machine

Type

type

Choose the type of machine on which to run your job. The default is n1-standard-1.

...

For more information on machine types, https://cloud.google.com/compute/docs/machine-types.

Advanced Settings

Info

NOTE: Changes made to your advanced settings set at the project level do not affect any overrides that have been previously applied at the individual job level. Job-level overrides remain as configured.

SettingDescription

VPC network mode

Select the network mode to use for this project.

If the network mode is set to Auto (default), the job is executed over publicly available IP addresses. Do not set values for Network, Subnetwork, and Worker IP address configuration.

Info

NOTE: Unless you have specific reasons to modify these settings, you should leave them as the default values. These network settings apply to job execution. Preview and sampling use the default network settings.

For Custom VPC networks:

  1. Specify the name of the VPC network in your region.
  2. Specify the short or full URL of the Subnetwork. If both Network and Subnetwork are specified, Subnetwork is used. See https://cloud.google.com/dataflow/docs/guides/specifying-networks.
  3. Review and specify the Worker IP address configuration setting. See below.

For more information:

Network

To use a different VPC network, e nter the name of the VPC network to use as an override for this job. Click Save to apply the override.

Subnetwork

To specify a different subnetwork, enter the URL of the subnetwork. The URL should be in the following format:

Code Block
regions/<REGION>/subnetworks/<SUBNETWORK>

where:

  • <REGION> is the region identifier specified under Region. These values must match.
  • <SUBNETWORK> is the subnetwork identifier.

If you have access to another project within your organization, you can execute your

D s dataflow
job through it by specifying a full URL in the following form:

Code Block
https://www.googleapis.com/compute/v1/projects/<HOST_PROJECT_ID>/regions/<REGION>/subnetworks/<SUBNETWORK>

where:

  • <HOST_PROJECT_ID> corresponds to the project identifier. This value must be between 6 and 30 characters. The value can contain only lowercase letters, digits, or hyphens. It must start with a letter. Trailing hyphens are prohibited.

Click Save to apply the override.

D s ed
rtrue
editionsgdpent,gdppro,gdppr

SettingDescription

Worker IP address configuration

If the VPC Network mode is set to custom, then choose one of the following for your

D s dataflow
jobs in this project:

  • Allow public IP addresses - Use
    D s dataflow
    workers that are available through public IP addresses. No further configuration is required.
  • Use internal IP addresses only -
    D s dataflow
    workers use private IP addresses for all communication.
    • If a Subnetwork is specified, then the Network value is ignored.
    • The specified Network or Subnetwork must have Private Google Access enabled.

Autoscaling

Algorithms

algorithms

The type of algorithm to use to scale the number of Google Compute Engine instances to accommodate the size of your job. Possible values:

  • Throughput based - Scaling is determined by the volume of data expected to be passed through
    D s dataflow
    .
  • None - None algorithm is applied.
    • If none is selected, use Initial Number of Workers to specify a fixed number of Google Compute Engine instances.

Initial number of workers

Number of Google Compute Engine instances with which to launch the job. This number may be adjusted as part of job execution. This number must be an integer between 1 and 1000, inclusive.

Maximum number of workers

Maximum number of Google Compute Engine instances to use during execution. This number must be an integer between 1 and 1000, inclusive, and must be greater than the initial number of workers. 

Service account

Email address of the service account under which to run the job

Every

D s product
job executed in
D s dataflow
requires that the job be submitted through a service account. By default,
D s product
uses a single Compute Engine service account under which jobs from all project users are run.

Optionally, you can specify a different service account under which to run your jobs for the project.

Info

NOTE: When using a named service account to access data and run jobs in other projects, each user running a job you must be granted the roles/iam.serviceAccountUser role on the service account to use it.

Info

NOTE: Individual users can specify service accounts under which their jobs are run. If companion service accounts are enabled, each user must have a service account specified for use.

For more information on service accounts, see Google Service Account Management .

Labels

Create or assign labels to apply to the billing for the

D s product
jobs run in your project. You may reference up to 64 labels.

Info

NOTE: Each label must have a unique key name.

For more information, see https://cloud.google.com/resource-manager/docs/creating-managing-labels.

Notes on behavior:

  • Values specified here are applied to all jobs executed within the current project. To apply these changes globally, you must edit these settings in each project of which you are a member.  
  • If property values are not specified here, then the properties are not passed in with any job execution, and the default 
    D s product
     property values are used. 
  • The property values specified here can be overridden by property values specified for individual jobs. For more information, see Dataflow Execution Settings.

D s also
labeldataflow