Page tree

 

If needed, you can specify multiple file or table targets as part of a single CLI job. In your CLI command, the path on the Trifacta node to this JSON file is specified as the publish_opt_file parameter, as in the following:

./trifacta_cli.py run_job --user_name <trifacta_user> --password <trifacta_password> --job_type spark --data redshift-test/datasources.tsv --script redshift-test/script.cli --cli_output_path ./job_info.out --profiler on --publish_opt_file /json/publish/file/publishopts.json

The file publishopts.json contains the specification of the targets.

Tip: To specify this file, you can run this job through the application. After the job has completed, download the CLI script from the Recipe panel in the Transformer page. The downloaded publishopts.json file contains the specification for the targets you just executed. See Recipe Panel.

Example publishopts.json file:

{
  "file": [
    {
      "path": "hdfs://hadoop:50070/trifacta/queryResults/admin@trifacta.local/POS-r01.csv",
      "action": "create",
      "format": "csv",
      "header": true,
      "asSingleFile": true,
      "compression": "none"
    },
    {
      "path": "hdfs://hadoop:50070/trifacta/queryResults/admin@trifacta.local/POS-r01.json",
      "action": "create",
      "format": "json",
      "header": false,
      "asSingleFile": false,
      "compression": "none"
    }
  ],
  "hive": [
    {
      "databaseName":"default",
      "tableName":"POS-r01",
      "action":"overwrite"
    }
  ]
}

NOTE: All of the following properties require valid values, unless noted.

File targets:

PropertyDescription
pathFull path to the target file. Path must include the protocol identifier, such as hdfs:// and the port number.
action

The action to take on the file. Supported actions:

  • create - Create a new file with each subsequent publication. Filenames for subsequent job runs are appended with the job number identifier.
  • append - The results of each subsequent job run are appended to the existing file contents.
  • replace - The results of each subsequent job run replace the same file. Previous job run results are lost unless moved out of the location.

Some limitations apply to these options. See Run Job Page.

format

Output format for the file. Supported formats:

  • csv
  • json
  • avro
  • pqt
header

If set to true, then output files in CSV format include a header row. Headers cannot be applied when compression is enabled.

asSingleFile

If set to true, then output files are written to a single file.

If set to false, then the output files are written to multiple files as needed.

compression

(optional) This property can be used to specify any compression to apply to a text-based file. Supported compression formats:

  • gzip
  • bzip2
  • snappy

If this is not specified, then no compression is applied to the output file.

Hive targets:

PropertyDescription
databaseName

Name of the database.

NOTE: The database must contain at least one table.

tableNameName of the table in the database to which to write.
action

The write action to apply to the table. Supported actions:

  • create - Create a new table with each subsequent publication. Table names for subsequent job runs are appended with a timestamp.
  • append - The results of each subsequent job run are appended to the existing table contents.
  • replace - The results of each subsequent job run are written to the same table, which has been emptied. Previous job run results are lost unless moved out of the location (dropAndLoad).
  • overwrite - The results of each subsequent job run are written to a newly created table with the same name as the output table from the previous job run (truncateAndLoad).

Some limitations apply to these options. See Run Job Page.

This page has no comments.