Page tree

Trifacta SaaS


Contents:

   

Contents:


Feature Availability: This feature is available in the following editions:

  • Trifacta® Enterprise Edition
  • Trifacta Professional Edition
  • Trifacta Premium



You can create connections to MongoDB and MongoDB Atlas connections through Trifacta application.  These connections enable to read data from the MongoDB workspace.


If you are connecting Trifacta SaaS to any relational source of data, you must add the Trifacta Service to your whitelist for those resources.For more information, see Getting Started with Trifacta SaaS.

Prerequisites

  • MongoDB supports basic (username/password) authentication.

Limitations

NOTE: During normal selection or import of an entire table, you may encounter an error indicating a problem with a specific column. Since some tables require filtering based on a particular column, data from them can only be ingested using custom SQL statements. In this case, the problematic column can be used as a filter in the WHERE clause of a custom SQL statement to ingest the table.

  • For more information, please consult the CData driver documentation for the specific table.
  • For more information on using custom SQL, see Create Dataset with SQL.

  • This connection is read-only.

Create Connection

MongoDB

To create a MongoDB connection, please specify the following properties:

PropertyDescription
Host

Name of the host.

PortSet this value to the port number through which to access MongoDB. By default, this value is 27017.
Database

The database that you want to read

Auth Database

Name of the MongoDB database used for authentication

Replica Set

(Optional) Comma-separated list of secondary servers in the replica set, specified by address and port.

replica set  is a group of mongoDB  processes that maintain the same data set. Replica sets provide redundancy and high availability and are the basis for all production deployments.  For more information, see https://docs.mongodb.com/manual/replication/.

Secondary ReadsEnable this checkbox if you want to read from secondary (slave) servers.
Use SSLEnable this checkbox if you want to connect using SSL.
Connect String Options

(Optional) You can specify additional options used to connect as a string value.

The following option sets the connection timeout in milliseconds:

Timeout=0;

The default value is 0, which disables connection timeouts. See below for more information.

Test Connection

After you have defined the connection credentials type, credentials, and connection string, you can verify that the Trifacta application can use them to connect to the database.

Default Column Data Type Inference

Set to disabled to prevent the product from applying its own type inference to each column on import. The default value is enabled.

Connection NameDisplay name of the connection
Connection Description(Optional) Description of the connection, which appears in the application.

MongoDB Atlas

To create a MongoDB Atlas connection, please specify the following properties:

PropertyDescription
Host

Name of the host.

PortSet this value to the port number through which to access MongoDB. By default, this value is 27017.
Database

The database that you want to read

Replica Set

(Optional) Comma-separated list of secondary servers in the replica set, specified by address and port.

replica set  is a group of mongoDB  processes that maintain the same data set. Replica sets provide redundancy and high availability and are the basis for all production deployments.  For more information, see https://docs.mongodb.com/manual/replication/.

Secondary ReadsEnable this checkbox if you want to read from secondary (slave) servers.
Connect String Options

(Optional) The option sets the connection timeout in milliseconds:

Timeout=0;

The default value is 0, which disables connection timeouts. See below for more information.

Test Connection

After you have defined the connection credentials type, credentials, and connection string, you can verify that the Trifacta application can use them to connect to the database.

Default Column Data Type Inference

Set to disabled to prevent the product from applying its own type inference to each column on import. The default value is enabled.

Connection NameDisplay name of the connection
Connection Description(Optional) Description of the connection, which appears in the application.

For more information on these settings, see http://cdn.cdata.com/help/RCF/jdbc/default.htm.

Create connection via API

Depending on your product edition, you can create connections of this type.

MongoDB:

"vendor": "mongodb",
"vendorName": "MongoDB",
"type": "jdbc"

MongoDB Atlas:

"vendor": "mongodb_atlas",
"vendorName": "MongoDB Atlas",
"type": "jdbc"


Trifacta SaaS API Reference docs: Premium | Standard

Connect string options

Connection timeout

By default, the supported driver applies a connection timeout to MongoDB of 0 seconds. As needed, you can modify the connection timeout through connect string options:

Timeout=<value_in_seconds>;

where:

<value_in_seconds> corresponds to the number of seconds for the time. 

Flattening Documents

Documents can contain other documents, which enables the storage of nested data. You can control the flattening of nested objects and arrays through the CData driver through Connect String Options.

NOTE: Columns that have been flattened can be accessed or referenced using custom SQL queries. Additional information is below.


Flatten Objects:

By default, the CData driver flattens nested Objects. As needed, you can set FlattenObjects to  false  to disable this behavior.

For more information, see http://cdn.cdata.com/help/DGF/jdbc/RSBMongodb_p_FlattenObjects.htm.

Flatten Arrays:

By default, CData driver does not flatten Arrays.

  • As needed, you can configure the number of elements that you want to have returned in your flattened arrays.  
  • To flatten all elements of all arrays, set FlattenArrays to -1.

For more information, see http://cdn.cdata.com/help/DGF/jdbc/RSBMongodb_p_FlattenArrays.htm.

Referencing flattened columns:

If you have flattened Objects or Arrays, you can reference these columns using square brackets in your custom SQL queries.

Example of flattened Object:

SELECT [address.city] FROM my_table;

Example of flattened Array:

SELECT * FROM my_table WHERE [hobbies.0]='cricket';


Driver Information

For more information on CData JDBC drivers, see http://cdn.cdata.com/help/DGF/jdbc/default.htm.

Using MongoDB

MongoDB is a NoSQL document database that provides high performance, availability, and scalability. 

MongoDB Data Organization Hierarchy

MongoDb has a two-level data hierarchy:

+ Schema1
  + Collection1
  + Collection2
+ Schema2
  + Collection3
  + Collection4
  • Schema roughly corresponds to a database.
  • Collection roughly corresponds to a table.
    • A collection is composed of documents. A Document is a binary JSON representation of the fields and values of a row. 

Database Uses

For more information on interacting with databases, see Using Databases.

Read Data

You can import datasets from MongoDB through the Import Data page. See  Import Data Page .

Data Type Mappings

NOTE: The Trifacta® data types listed in this section reflect the raw data type of the converted column. Depending on the contents of the column, the Transformer Page may re-infer a different data type, when a dataset using this type of source is loaded.

Access/Read

When data is imported from MongoDB, the supported data types from the source are converted to corresponding data types supported by the application. For more information, see Type Conversions.

Source Data TypeSupported

Trifacta data type

ObjectIdY

String

RegExYString
StringYString
BinaryYString
IntegerYInteger
TimestampYDatetime
DoubleYFloat
ArrayYString
BoolYbool
NullYString
DateYDatetime

Write/Publish

Not supported.

This page has no comments.