D toc |
---|
Excerpt | ||||||
---|---|---|---|---|---|---|
In
|
...
most jobs to transform your data are executed by default on
|
Project owners can choose to enable
D s photon |
---|
D s node |
---|
Default
D s dataflow |
---|
...
- To run a job, open the flow containing the recipe whose output you wish to generate.
- Locate the recipe. Click the recipe's output object icon.
- On the right side of the screen, information about the output object is displayed. The output object defines:
- The type and location of the outputs, including filenames and method of updating.
- Profiling options
- Execution options
- For now, you can ignore the options for the output object. Click Run Job.
- In the Run Job page, you can review the job as it is currently specified.
- To run the job on
, select Dataflow.D s dataflow - Click Run Job.
- The job is queued with default settings for execution on
.D s dataflow
For more information, see Run Job Page.
Tracking progress
You can track progress of your job through the following areas:
- Flow View: select the output object. On the right side of the screen, click the Jobs tab. Your job in progress is listed. See Flow View Page.
- Job Details Page: Click the link in the Jobs tab. You can review progress and individual details related to your job. See Job Details Page.
Download results
When your job has finished successfully, a Completed message is displayed in the Job Details page.
...
Info |
---|
NOTE: You must have a connection configured to publish to an external datastore . For more information, see Connections Page. |
For more information, see Publishing Dialog.
available through the Connections page. |
Output Options
When specifying the job you wish to run, you can define the following types of output options.
...
Tip |
---|
Tip: You can download PDF and JSON versions of the visual profile for offline analysis . |
...
in the Job Details |
...
page. |
For more information, see Overview of Visual Profiling.
...
- Parameter values can be defined at the flow level through Flow View. For more information, see Manage Parameters Dialog.
- These parameters values can be passed into the running environment and inserted into the output filename or table name.
- For more information, see Overview of Parameterization.
...
These settings can be specified at the project level or at the individual output object level:
- Project Execution settings: At the project level Within your preferences, you can define the your execution options for jobs. By default, all of your jobs executed from flows within the project use these settings. For more information, see Project Execution Settings Page.
- Output object settings: The execution settings in the Project Execution Settings page can be overridden at the output object level. When you define an individual output object for a recipe, the execution settings that you specify in the Run Job page apply whenever the outputs are generated for this flow. See Dataflow Execution Settings.
...
Run job in shared VPC network (internal IP addresses)
D s ed | ||
---|---|---|
|
You can specify to run your job in a VPC network that is shared with multiple projects. Please configure the following settings for your Dataflow Execution Settings:
- VPC Network Mode:
Custom
- Network: Do not modify. When a Subnetwork value is specified,
ignores the Network value.D s dataflow Subnetwork: Specify the subnetwork using a full URL. See below.
Info NOTE: If the Subnetwork is located within a shared VPC network, you must specify the complete URL.
Info NOTE: Additional subnet-level permissions may be required.
...
For more information on subnet-level permissions, see https://cloud.google.com/vpc/docs/provisioning-shared-vpc#networkuseratsubnet.
Run job API
You can also run jobs using the REST APIs.
...
Tip |
---|
Tip: Unless performance issues related to your resource selections apply to all jobs in the project, you should make changes to your resources for individual output objects. If those changes improve performance and you are comfortable with the higher costs associated with the change, you can consider applying them through the Project Execution Settings page for all jobs in the project. |
Choose machine type
A machine type is a set of virtualized hardware resources, including memory size, CPU, and persistent disk storage, which are assigned to a virtual machine (VM) responsible for executing your job.
...
- Billing for your job depends on the machine type (resources) that have been assigned to the job. If you select a more powerful machine type, you should expect higher costs for each job execution.
provides a subset of available machine types from which you can select to execute your jobs. By default,D s product
uses a machine type that you define in the Project Settings pageyour Execution Settings page.D s product If you are experiencing long execution times and are willing to incur additional costs, you can select a more powerful machine type.
...
Machine scaling algorithms
D s ed | ||
---|---|---|
|
By default,
D s product | ||||
---|---|---|---|---|
|
Info |
---|
NOTE: Auto-scaling can increase the costs of job execution. If you use auto-scaling, you should specify a reasonable maximum limit. |
...
By default,
D s product |
---|
Info |
---|
NOTE: Under the Permissions tab, please verify that Include Google-provided role grants is selected. |
To see the current service account for your project:
...
- Each label must have a unique key within your project.
- You can create up to 64 labels per project.
D s also | ||||
---|---|---|---|---|
|