Skip to main content

SFTP Connections

You can create connections to SFTP servers to upload your datasets to the Trifacta Application.

Linux- and Windows-based SFTP servers are supported.

Jobs can be executed from SFTP sources on the following running environments:

  • HDFS-based Spark

  • Databricks

  • Trifacta Photon

  • Spark on EMR

Supported Environments:

Operation

Designer Cloud Powered by Trifacta Enterprise Edition

Amazon

Microsoft Azure

Read

Supported

Supported

Supported

Write

Supported

Supported

Supported

Limitations

  • Files and folders with spaces or special characters in them cannot be used. For example, a file or folder on the SFTP server with a hashtag (#) in it cannot be used for data.

    • Files and folders whose names begin with underscore (_) are not visible.

  • Ingest of over 500 files through SFTP at one time is not supported.

  • Through SFTP connections, you cannot run jobs on Avro or Parquet files or on files that require conversion such as JSON, PDF, and Excel.

  • You cannot publish compressed Snappy files to SFTP destinations.

  • You cannot publish Hyper format to SFTP destinations.

Prerequisites

  • Acquire user credentials to access the SFTP server. You can use username/password credentials or SSH keys. See below.

  • Verify that the credentials can access the proper locations on the server where your data is stored. Initial directory of the user account must be accessible.

SSH Keys

If preferred, you can use SSH keys to for authentication to the SFTP server.

Note

SSH keys must be private RSA keys. If you have OpenSSH keys, you can use the ssh-keygen utility to convert them to private RSA keys.

Note

The SFTP server may be configured to allow SSH logins from only certain IP addresses. For other addresses, an additional prompt for the SFTP user password will be generated, which interferes with the Designer Cloud Powered by Trifacta Enterprise Edition connection. Ask the SFTP server admin to allow-list the public IP address of the Designer Cloud Powered by Trifacta Enterprise Edition server.

Allow-list SFTP server

If you are running jobs on EMR or Azure Databricks, you must add the SFTP server to the allow- list of IPs that are permitted to communicate with the cluster. For more information, please see the documentation that is provided with your software distribution.

You must also add the SFTP server to the allow-list of file storage systems. Details are below.

Enable

By default, this connection type is automatically enabled for use.

Note

You must provide the protocol identifier and storage locations for the SFTP server. See below.

Configure file storage protocols and locations

The Designer Cloud Powered by Trifacta platform must be provided the list of protocols and locations for accessing SFTP.

Steps:

  1. You can apply this change through the Admin Settings Page (recommended) or trifacta-conf.json. For more information, see Platform Configuration Methods.

  2. Locate the following parameters and set their values according to the table below:

    "fileStorage.whitelist": ["sftp"],
    "fileStorage.defaultBaseUris": ["sftp:///"],

    Parameter

    Description

    filestorage.whitelist

    A comma-separated list of protocols that are permitted to access SFTP.

    Note

    The protocol identifier "sftp" must be included in this list.

    filestorage.defaultBaseUris

    For each supported protocol, this parameter must contain a top-level path to the location where platform files can be stored. These files include uploads, samples, and temporary storage used during job execution.

    Note

    A separate base URI is required for each supported protocol. You may only have one base URI for each protocol.

    Note

    For SFTP, three slashes at the end are required, as the third one is the end of the path value. This value is used as the base URI for all SFTP connections created in Designer Cloud Powered by Trifacta Enterprise Edition.

    Example:

    sftp:////

    The above example is the most common example, as it is used as the base URI for all SFTP connections that you create. If you add a server value to the above URI, you limit all SFTP connections that you create to that specified server.

  3. Save your changes and restart the platform.

Enforce authentication methods

By default, the Trifacta Application enables use of two different authentication mechanisms:

  • Basic - use a password to access the SFTP server

  • SSHKey - use a public SSHKey and password to access the SFTP server

Along with basic and SSH key, the SFTP servers in your environment may be configured with other authentication methods, and those methods sometimes take precedence. As a result, when using default authentication methods, SFTP connections from the Designer Cloud Powered by Trifacta platform can fail to connect to the SFTP server.

To eliminate these issues, you can configure the Trifacta Application to enforce usage of one of the following authentication schemes. These schemes are passed to the SFTP server during connection time, which forces the server to use the appropriate method of authentication. When the following parameter is specified, SFTP connections can be configured using the listed methods and should work for connecting to the server.

Note

Enforcement applies to connections created via the APIs as well. After configuration, please be sure to use one of the enforced authentication methods when configuring your SFTP connections through the application or the APIs.

Steps:

  1. To apply this configuration change, login as an administrator to the Trifacta node. Then, edit trifacta-conf.json. For more information, see Platform Configuration Methods.

  2. Locate the following parameter in the configuration file:

    "batchserver.workers.filewriter.hadoopConfig.sftp.PreferredAuthentications"
  3. Set the parameter value according to the following:

    Preferred authentication method

    Parameter value

    Description

    Basic

    "password"

    Basic password authentication method is used to connect to the SFTP server.

    Note

    You must configure your SFTP server connection in the platform to use the Basic method.

    SSHKey

    "publickey"

    SSH Key authentication method is used.

    Note

    You must configure your SFTP server connection in the platform to use the SSHKey method.

    both

    "publickey,password"

    Both methods of authentication are supported.

  4. Save your changes and restart the platform.

Java VFS service

Use of SFTP connections requires the Java VFS service.

Note

This service is enabled by default.

For more information on configuring this service, see Configure Java VFS Service.

Create Connection

Create through application

You can create a SFTP connection through the Trifacta Application.

Steps:

  1. In the left nav bar, select the Connections icon. See Connections Page.

  2. In the Connections page, click Create Connection. See Create Connection Window.

  3. In the Create Connection window, click the SFTP connection card.

  4. Specify the properties for your SFTP server.

    Property

    Description

    Host

    The hostname of the FTP server to which you are connecting. Do not include any protocol identifier (sftp://).

    Port

    The port number to use to connect to the server. Default port number is 22.

    Credential Type

    Select one of the following:

    basic - authenticate via username and password

    SSH Key - authenticate via username and SSH key

    User Name

    The username to use to connect.

    Password

    (Basic credential type) The password associated with the username.

    SSH Key

    (SSH Key credential type) The SSH key that applies to the username.

    Test Connection

    Click this button to test the connection that you have specified.

    Default Directory

    Absolute path on the SFTP server where users of the connection can begin browsing.

    Block Size (Bytes)

    Fetch size in bytes for each read from the SFTP server.

    Note

    Raising this value may increase speed of read operations. However, if it is raised too high, resources can become overwhelmed, and the read can fail.

    Connection Name

    The name of the connection as you want it to appear in the application.

    Description

    This description is displayed in the application.

    For more information, see Create Connection Window.

  5. Click Save.

Create through APIs

  • Type: jdbc

  • Vendor: sftp

For more information, see https://api.trifacta.com/ee/9.7/index.html#operation/createConnection