Page tree

 

Contents:


NOTE: This feature requires developer-level skills to enable and use.

The command line references lets you manage connections between the Trifacta® platform and various types of datastores. You can also use this CLI for the following:

  • Create, edit, or delete connections

    NOTE: In this release, you cannot create SQL DW connections via the CLI. This known issue will be fixed in a future release.

  • Get information on all connections

NOTE: Sharing of connections is not supported through the command line interface.

Requirements

NOTE: Some types of connections available through the UI cannot be created through the CLI. For more information, see Connection Types.

 

  • The CLI must have access to a running Trifacta instance. You can specify the host and port of this instance.
  • For each connection that you create, the Trifacta node must be able to access it through the listed host and port.

Command Reference

The CLI tools are stored in the following directory:

/opt/trifacta/bin/

For creating or modifying connections, execute the following command: 

./trifacta_cli.py (parameters)

Parameters are specified below.

Parameters

Common

These parameters are common to job or connection actions.

ParameterDescriptionApplicable CLI Commands
command_type(Required) The type of CLI command to execute. Accepted values:
  • create_connection - Create a new connection object.

  • edit_connection - Edit an existing connection object.
  • list_connections - List all connection objects for the specified user.
  • delete_connection - Delete a connection object.

See  Examples below.

For more information on the following commands, see CLI for Jobs.

  • run_job - Execute a specified job on the specified running environment.
  • get_job_status - Get job status information.
  • get_publications - Acquire publication information for a specified job.
  • publish - Publish a completed job to the specified database table, which has not been created yet.
  • load_data  - Load data into the database table, to which a schema has already been applied. Use to append to existing table.
  • truncate_and_load - Overwrite data in specified table.
All
user_name

(Required) Trifacta username of the user to execute the job. Please specify the full username.

NOTE: In the response, this value is listed as  user.

All
password

(Required) Trifacta password for the username
If no password is specified, you are prompted to enter one.

NOTE: If you have enabled Kerberos-based access to the Trifacta platform, you do not need to provide a password. To enable, additional configuration is required. See Set up for a Kerberos-enabled Hadoop cluster.

NOTE: Passwords can be stored in an external file. See CLI Config File.

All
cli_output_path

(Optional) Defines the client-side path where the JSON output is stored for all commands. Default value is cli_results.out.

NOTE: The user issuing the command must also have execute permissions on all parent folders in the specified cli_output_path.


All
disable_ssl_certification

(Optional) When communicating over HTTPS, this setting can be used to override the default behavior of validating the server certificate before executing the command.

NOTE: If you have stored a self-signed certificate on the Trifacta node, please set the REQUESTS_CA_BUNDLE environment variable to point to the directory that contains the trusted server's certificate(s). The CLI will verify against these certs. In this case, the disable_ssl_certificate parameter is not needed.

All
conn_ssl

(Optional) Connect to the datastore over SSL.

NOTE: You must modify the  host  parameter value to include the appropriate port number for the SSL connection.

NOTE: SSL connections are not supported for Hive, Redshift, or SQL Server.

 

 

All

Params for managing connections

The following parameters apply to managing connection objects only. Some of the preceding parameters may be required for connection actions.

ParameterDescriptionApplicable CLI Command
conn_type

The type of connection.

NOTE: After the connection has been created, you cannot change its type.

Tip: For a list of supported connection types, enter the following at the command line:

./trifacta_cli.py create_connection -h

 

These connection types can be created by Trifacta admin users only:

NOTE: These connections must be created through the CLI and must be created as public connections. Include the conn_is_global flag in the command.

NOTE: Only 1 Hive and connection is permitted per Trifacta deloyment.

  • Hadoop Hive
  • Amazon Redshift

    NOTE: A Redshift connection requires S3 as your base storage layer. See Set Base Storage Layer.

These connection types can be created by any user with appropriate permissions:

NOTE: Jobs using sources from these connections cannot be executed on Spark.

 

  • Microsoft SQL Server
  • PostgreSQL Database
  • Oracle Database
  • Teradata Database

 

For more information on the supported connection types and the tokens to insert for this parameter, see Connection Types.

create_connection
conn_name

Internal name of the connection. This name is referenced in your CLI scripts. It should be a single value without spaces.

NOTE: This value must be unique among your connection names.

create_connection, edit_connection , list_connection, delete_connection
conn_id

The internal identifier for the connection. When a connection is created, it is assigned an internal numeric identifier. This ID or the connection_name can be used to reference the connection in future commands.

Tip: This value is available when you hover over a connection in the application. See Flows Page.

edit_connection , list_connection , delete_connection
conn_hostHost of the datastore to which you are connecting.create_connection, edit_connection
conn_portPort number to access the datastore.create_connection, edit_connection
conn_description

This text value is displayed to users when they create or edit connections of this type through the Trifacta application.

create_connection, edit_connection
conn_credential_type

The type of credentials to create. Supported values:

  • basic - Simple username/password to be provided in conn_credential_location. Used for JDBC database connections.
  • aws - AWS-specific credentials to be provided in conn_credential_location. Used for Redshift connections.
  • trifacta_service - Uses the Trifacta credentials specified in trifacta-conf.json. Used for Hive connections.
create_connection, edit_connection
conn_credential_location

The path to a JSON file containing the credentials for your connection, as consistent with the conn_credential_type. For more information on the expected format, see Credentials file below.

NOTE: A credential file is not needed if the credential type is trifacta_service.

create_connection, edit_connection
conn_params_locationWhen you create a connection, you can reference a JSON file containing parameters to apply during the creation of any connection of this type. See Params file below.create_connection, edit_connection
conn_skip_test

If this parameter is added to the command, the connection is not tested. The default is to test the connection. This flag requires no value.

Tip: After creation, you can test and modify the connection through the application. See Flows Page.

create_connection, edit_connection
conn_is_global

If this parameter is added, the connection is public and is available to all Trifacta users after it has been created. This flag requires no value.

NOTE: To use this option, the executing user must be a Trifacta admin. Hive and Redshift connections require this parameter.

NOTE: After a connection has been made public, it cannot be made private again. It must be deleted and recreated.

create_connection, edit_connection

 

For documentation on the CLI parameters, run:

./trifacta_cli.py --help

Additional documentation might be available for individual commands using the following:

./trifacta_cli.py <commmand> --help

Credentials file

You can store connection login credentials in a file on the Trifacta node. When managing connections, you can reference this JSON credentials file in the command, which forces the use of encrypted versions of the credentials stored in the file. Examples are provided below.

Example - Basic credentials:

This example applies for relational connection types: Oracle, PostGreSQL, SQL Server, and Teradata. 

{
  "username": "<your_username>",
  "password": "<your_password>"
}

Example - AWS credentials:

This example applies to connections of AWS type (Redshift).

{
 "username": "<your_user>",
 "password": "<your_password>"
  "iamRoleArn": "<your_IAM_role_ARN>"
}

NOTE: iamRoleArn is optional. For more information, see Configure for EC2 Role-Based Authentication.

Params file

In an external file, you can create a set of parameters to pass to any object for which you are creating a connection. For example, when you create a connection to a database, you may need to reference a default database to which any instance of the connection connects.

The following parameters are supported for each vendor. 

VendorJSON ParameterDescriptionRequired
HivedefaultDatabaseName of the default databaseNo
RedshiftdefaultDatabaseName of the default databaseYes
PostgreSQLdatabaseName of the database.Yes
OracleserviceService to use for the connectionYes
SQL ServerNone.  
TeradataNone.  

Additional parameters:

Except for Redshift connections, you can submit additional configuration parameters using the ConnectStrOpts key-value pair in the parameters file. Example:

"connectStrOpts": ";transportMode=http;httpPath=cliservice"

NOTE: Each vendor uses a specific separator between the connection URL and the connection string options. For example, if you are creating or editing a Teradata connection and are submitting ConnectStrOpts parameters, the string value must begin with a comma:

"connectStrOpts": ",Key1=Value1,Key2=Value2"

For more information, see the documentation provided with your database product.

Redshift uses the extraLoadParams method, which is described below.

Arbitrary parameters for JDBC connections:

For any supported JDBC connection type, you can include arbitrary parameters specific to the JDBC database as part of your connection string options.

With the exception of Oracle, all JDBC vendor support the following example. The database to which you are connecting supports a parameter (myView) which for the database has a value of custom1. To extend the preceding Teradata example, your connectStrOpts value would be the following, which begins with a comma (,): 

"connectStrOpts": ",Key1=Value1,Key2=Value2?myView=custom1"

When the connection is created and used, the connection string might look like the following:

jdbc:teradata://example.com:1025/DatabaseServerName?myView=custom1

For submitting arbitrary parameters to Oracle, please see the example below.

Example - Hive params:

NOTE: By default, the Hive connection is defined to use TCP. If you are using HTTP to connect to Hive, additional configuration is required, including insertion of additional parameters in your params file. See Configure for Hive.

NOTE: If you are connecting to a Kerberos-enabled cluster, you must include the Kerberos principal for Hive as part of the connectStrOpts value. See Configure for Hive.

{
  "connectStrOpts": ";<depends_on_deployment>",
  "defaultDatabase": "default",
  "jdbc": "hive2"
}

For more information on connection string options for Hive, see Configure for Hive.

Example - Redshift params:

{
 "defaultDatabase":"<your_database>",
 "extraLoadParams": "BLANKSASNULL EMPTYASNULL TRIMBLANKS TRUNCATECOLUMNS"
}

The first parameter defines the default database.

The second parameter is used when you publish results to Redshift. For more information on these values, see http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html.

Example - PostgreSQL params:

{
  "database":"<your_database>"
}

Example - Oracle params:

{
  "service":"orcl"
}

For submitting arbitrary parameters to Oracle, the arbitrary string must follow the ORA format, in which most of the connection string is replaced by parameters. For example:

{
  "connectStrOpts": "(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=orcl)))",
  "service":"orcl"
}

In this case, the generated connection string might look like the following:

jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=1521))(CONNECT_DATA(SERVICE_NAME=orcl)))

The original host, port, and service name values specified in the connection are ignored and replaced by these values.

Examples

At the command line, all jobs must be executed through connection objects. For each datastore to which the Trifacta platform is connected, you must create at least one connection object and then reference it in any job execution tasks.

Create connection

NOTE: For Hive, connections must be created as public connections (include the --conn_is_global flag). You can only create one connection of each of these types.

For more information on creating a Hive connection through the CLI, see Configure for Hive.

Command

Example (all one command):

./trifacta_cli.py create_connection --user_name <trifacta_user> --password <trifacta_password>
--conn_type microsoft_sqlserver --conn_name aSQLServerConnection
--conn_description "This is my connection."
--conn_host example.com --conn_port 1234
--conn_credential_type basic
--conn_credential_location ~/.trifacta/config_conn.json
--conn_params_location ~/.trifacta/p.json
--cli_output_path ./conn_create.out

Output

Success: Connection aSQLServerConnection created
JSON results written to conn_create.out.

JSON Response

Output is written to ./conn_create.out.

{
    "conn_credential_location": "~/.trifacta/config_conn.json", 
    "conn_credential_type": "basic", 
    "conn_host": "example.com", 
    "conn_id": 9, 
    "conn_name": "aSQLServerConnection", 
    "conn_params_location": "~/.trifacta/p.json", 
    "conn_port": "1234", 
    "conn_type": "microsoft_sqlserver", 
    "host": "http://example.com:3005", 
    "results": {
        "createdAt": "2016-06-30T21:53:58.977Z", 
        "createdBy": 3, 
        "credential_type": "basic", 
        "credentials": [
            {
                "username": "<trifacta_user>"
            }
        ], 
        "deleted_at": null, 
        "description": null, 
        "host": "example.com", 
        "id": 9, 
        "is_global": false, 
        "name": "aSQLServerConnection", 
        "port": 1234, 
        "type": "microsoft_sqlserver", 
        "updatedAt": "2016-06-30T21:53:58.977Z", 
        "updatedBy": 3
    }, 
    "status": "success", 
    "user_name": "<trifacta_user>"
}

Edit connection

Command

In the following command, all parameters specified within angled brackets are optional settings that can be changed. The other ones are required to perform any edit. 

You must specify the conn_name or the conn_id.

NOTE: If you are editing the connection's credentials, you must specify the conn_credential_type in the command, which is required if you are changing any credential parameter. This step completely replaces the old credentials, so you must specify all connection parameters in the command.

Example (all one command):

./trifacta_cli.py edit_connection --user_name <trifacta_user> --password <trifacta_password>
--conn_name aSQLServerConnection <--conn_type microsoft_sqlserver>
<--conn_description "This is my connection.">
<--conn_host mynewhost.com> <--conn_port 1234>
<--conn_credential_type basic> <--conn_credential_location ~/.trifacta/config_conn.json>
<--cli_output_path ./conn_edit.out>

Output

Following assumes that only the above values for host and cli_output_path contain new values:

Success: Updated connection aSQLServerConnection
JSON results written to conn_edit.out.

JSON Response

Output is written to ./conn_edit.out.

{
    "conn_description": "This is my connection.", 
    "conn_id": 9, 
    "conn_name": "aSQLServerConnection", 
    "conn_params_location": "~/.trifacta/p.json", 
    "host": "http://nynewhost.com:3005", 
    "results": {
        "createdAt": "2016-06-30T22:08:47.016Z", 
        "createdBy": 3, 
        "credential_type": "basic", 
        "credentials": [
            {
                "username": "<trifacta_user>"
            }
        ], 
        "deleted_at": null, 
        "description": "This is my connection.", 
        "host": "mynewhost.com", 
        "id": 9, 
        "is_global": false, 
        "name": "aSQLServerConnection", 
        "port": 1234, 
        "type": "microsoft_sqlserver", 
        "updatedAt": "2016-06-30T22:09:03.670Z", 
        "updatedBy": 3
    }, 
    "status": "success", 
    "user_name": "<trifacta_user>"
}

List connections

Command

Example (all one command):

./trifacta_cli.py list_connections --host dev.redshift.example.com
--user_name <trifacta_user> --password <trifacta_password>
--cli_output_path ./conn_list.out

Tip: You can specify a conn_name or conn_id to return the information about a connection.

Output

Listing connections
Found 2 connections for params {'noLimit': 'true'}.
Redshift:
  description: None
  host: dev.redshift.example.com
  credentials: ["{u'username': u'<trifacta_user>'}"]
  port: 5439
  is_global: True
  name: Redshift
  id: 2
  credential_type: custom
  params:
     extraLoadParams: BLANKSASNULL EMPTYASNULL TRIMBLANKS TRUNCATECOLUMNS
    defaultDatabase: dev
  type: amazon_redshift
Hive:
  description: None
  host: dev.hive.example.com
  credentials: ["{u'username': u'<trifacta_user>'}"]
  port: 10000
  is_global: True
  name: Hive
  id: 1
  credential_type: conf
  params: 
     jdbc: hive2
    connectStrOpts:
    defaultDatabase: default
  type: hadoop_hive
JSON results written to conn_list.out.

JSON Response

Output is written to ./conn_list.out.

{
    "connections": [
        {
            "conn_createdAt": "2016-06-01T21:12:59.383Z", 
            "conn_createdBy": 2, 
            "conn_credential_type": "custom", 
            "conn_credentials": [
                {
                    "username": "<trifacta_user>"
                }
            ], 
            "conn_deleted_at": null, 
            "conn_description": null, 
            "conn_host": "dev.redshift.example.com", 
            "conn_id": 2, 
            "conn_is_global": true, 
            "conn_name": "Redshift", 
            "conn_params": {
                "extraLoadParams": "BLANKSASNULL EMPTYASNULL TRIMBLANKS TRUNCATECOLUMNS",
                "defaultDatabase": "dev"
            }, 
            "conn_port": 5439, 
            "conn_type": "amazon_redshift", 
            "conn_updatedAt": "2016-06-01T21:33:38.672Z", 
            "conn_updatedBy": 2
        }, 
        {
            "conn_createdAt": "2016-06-01T21:11:41.222Z", 
            "conn_createdBy": 2, 
            "conn_credential_type": "conf", 
            "conn_credentials": [
                {
                    "username": "<trifacta_user>"
                }
            ], 
            "conn_deleted_at": null, 
            "conn_description": null, 
            "conn_host": "dev.hive.example.com", 
            "conn_id": 2, 
            "conn_is_global": true, 
            "conn_name": "Hive", 
            "conn_params": {
                "jdbc": "hive2",
                "connectStrOpts": "",
                "defaultDatabase": "default"
            }, 
            "conn_port": 10000, 
            "conn_type": "hadoop_hive", 
            "conn_updatedAt": "2016-06-01T21:39:58.090Z", 
            "conn_updatedBy": 2
        }
    ], 
    "host": "http://localhost:3005", 
    "status": "success", 
    "user_name": "<trifacta_user>"
}

Delete connection

NOTE: You cannot delete a connection that is in use by a dataset.

Tip: You can delete a connection by using its internal connection identifier (conn_id), instead of its connection name.

Command

Example (all one command):

./trifacta_cli.py delete_connection --user_name <trifacta_user> --password <trifacta_password>
--conn_name aSQLServerConnection --cli_output_path ./conn_delete.out

Output

Success. Deleted connection with id: 9
JSON results written to conn_delete.out.

JSON Response

Output is written to ./conn_delete.out.

{
    "conn_name": "aSQLServerConnection", 
    "host": "http://localhost:3005", 
    "status": "success", 
    "user_name": "<trifacta_user>"
}

This page has no comments.