Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

ParameterDescriptionApplicable CLI Commands
host

(Required) The server and port number of the

D s item
instance
instance
.
Replace this value with the host and port of the running
D s item
instance
instance
. If it is not provided, localhost:3005 is assumed.

Info

NOTE: In some environments, the http:// or https:// protocol identifier may be required as part of the host value.

All
conn_name

Internal name of the connection. This name is referenced in your CLI scripts. It should be a single value without spaces.

Info

NOTE: This value must be unique among your connection names.

load_data, publish, truncate_and_load
conn_id

The internal identifier for the connection. When a connection is created, it is assigned an internal numeric identifier. This ID or the connection_name can be used to reference the connection in future commands.

Tip

Tip: This value is available when you hover over a connection in the application. See Flows Page.

publish , load_data, truncate_and_load
job_type

The execution environment in which to run the job:

photon = Run on Photon running environment on

D s server
.

Info

NOTE: If the job_type parameter is not specified, CLI jobs are run on the Photon running environment.

hadoop = Run in the default running environment for your Hadoop cluster.

Info

NOTE: When this job type is applied, your CLI scripts automatically transition to running jobs in Spark.

spark = Run on the Spark running environment in Hadoop.

databricksSpark = Run the job on the Azure Databricks running environment in Azure.

For more information on these running environments, see Running Environment Options.

run_job
job_idThe internal identifier for the job. This value can be retrieved from the output of a completed run_job command.get_job_statuspublish,get_publications,load_data
profilerWhen on, profiling of your job is enabled. Default is off.run_job
dataFull UNIX path to the source TSV file. This file contains a URL pointing to the actual Hive or HDFS source: one TSV file for each job run. Executing user must have access to this file.run_job
script
Full UNIX path from the
D s item
root directory
root directory
to the CLI script file. Executing user must have access.
run_job
publish_action

(Optional) Defines the action taken on second and subsequent publish operations:

  • create - (default) A new file is created with each publication. Filename is numeric identifier of the job ID.
  • append - Each publication appends to the existing output file. Filename is consistent across publications.

    Info

    NOTE: Compression of published files is not supported through the command line interface.

    Info

    NOTE: When publishing single files to S3, the append operation is not supported.

  • replace - Subsequent publications replace the same file with each execution.
run_job
header

(Optional), The output for a CSV job with append or create publishing action includes the column headers as the first row. Default is false.

Info

NOTE: If you use the header option, you must also include the single_file option, or this setting is ignored.

run_job
single_file(Optional) When true, CSV or JSON outputs are written to a single file. Default is false.run_job
output_path

(Required) Defines the fully qualified URI to where the job results are written, as in the following examples:

Code Block
hdfs://host:port/path/filename.csv
s3://bucketName/path/filename.csv
Info

NOTE: The output_path must include the protocol identifier or host and port number (if applicable).

This parameter specifies the base filename. If you are publishing files, the publish_action parameter value may change the exact filename that is written.

Protocol is set in webapp.storageProtocolin

D s triconf
.

run_job
output_format

Accepted values:csv,jsonpqt (Parquet), and avro (Avro).

Info

NOTE: For  pqt  format,  job_type=spark is required.

For job_type=photon, you may generate csv, json, and avro results.

run_job
database
Name of Redshift or Hive database to which you are publishing or loading.
publish,load_data
table

The table of the database to which you are publishing or loading.

publish,load_data
publish_format

The format of the output file from which to publish to Hive or Redshift tables. Accepted values: csv, jsonpqt (Parquet), or avro (Avro).

publishget_publications
publish_opt_filePath to file containing definitions for multiple file or table targets to which to write the job's results. For more information, see CLI Publishing Options File.run_job
skip_publish_validation

By default, the CLI automatically checks for schema validation when generating results to a pre-existing source.

If this flag is set, schema validation is skipped on results output.

run_job

...