Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r100

D toc

Excerpt

This section provides overview information on how to configure the running environments accessible from your deployment of the

D s webapp
typePortal
.

A running environment is the set of services that are used to execute a job.

  • A job can include tasks to do the following:
    • Ingest data
    • Transform data
    • Profile data
    • Sample data
    • Generate results
  • A running environment can be hosted on the 
    D s node
     or across a cluster that is connected to the product.

D s photon

Hosted on the 

D s node
D s photon
 is an in-memory running environment designed for high performance on small- to medium-sized jobs. 

D s ed
editions_notgdple
Info

NOTE:

D s photon
lives in the
D s item
itemVPC
. These jobs are not executed in a customer VPC. Data is streamed to the
D s item
itemVPC
for transformation and is not stored within the VPC.

Info

NOTE: You cannot cancel jobs that have been launched on

D s photon
.

Configuration:

D s photon
 may require enablement in your project or workspace:



D s dataflow

D s dataflow
is a fully managed, serverless data processing service that is hosted in the
D s gcp platform
. Managed by Google, this service is enabled by default when you enable
D s product
productgdp
in any of your
D s gcp platform
projects.

Configuration:

  • Access to the
    D s dataflow
    service is governed through permissions in the IAM roles for users. Access is enabled by default. For more information, see Required Dataprep User Permissions.
  • Jobs are run on
    D s dataflow
    using service accounts. The default Compute Engine service account deployed to your project has sufficient permissions to run
    D s dataflow
    jobs. For more information, see Google Service Account Management.

BigQuery

D s ed
editionsgdpent,gdppro,gdppr,gdpst

BigQuery is a cloud-based data warehouse platform that is fully integrated into the Google Cloud Platform. BigQuery supports a standard SQL dialect for querying datasets and tables and enables the writing of results back from the product. For more information, see https://cloud.google.com/products/bigquery/.

For datasets and outputs that are hosted in BigQuery, you can configure the

D s webapp
typePortal
 to perform the transformation steps of your job in BigQuery. In this manner, no data needs to be transferred to and from the data warehouse, and performance should be significantly better.

Tip

Tip: Jobs must be enabled for execution in BigQuery for each flow. For more information, see Flow Optimization Settings Dialog.

Limitations:

Info

NOTE: BigQuery is not a running environment that you explicitly select or specify as part of a job. If all of the requirements are met, then the job is executed in BigQuery when you select

D s dataflow
. For more information on limitations, see Overview of Job Execution.

Configuration:

  • A project owner must enable the following features in the project:
  • For individual flows, all general and BigQuery optimizations must be enabled. For more information, see Flow Optimization Settings Dialog.

D s also
inCQLtrue
label((label = "running_environment") OR (label = "job_execution"))