Page tree

Outdated release! Latest docs are Release 8.2: Initialize the Databases

   

Contents:


Initialize

Use the following steps to initialize the two databases required by the platform: Trifacta® database and Jobs database.

NOTE: These steps assume that the Trifacta node is the host of these databases. Please modify the following steps if you are connecting to databases on other nodes.

Pre-requisites:

  • The installing user must have write permissions to the directory from which the commands are executed.
  • The installing user must have sudo privileges.

Steps:

 

  1. Initialize the DB:

    1. For CentOS 7.x:

      sudo /usr/pgsql-9.3/bin/postgresql93-setup initdb
    2. For CentOS 6.x, RHEL 6.x:

      sudo service postgresql-9.3 initdb
    3. For RHEL 7.x:

      sudo /usr/pgsql-9.3/bin/postgresql93-setup initdb
    4. For Ubuntu 14.04 / 16.04: 

      pg_createcluster -d /var/lib/postgresql/9.3/main 9.3 main
  2. To set custom database names, usernames, and passwords:

    1. Edit 

      /opt/trifacta/conf/trifacta-conf.json

    2. For each database below, you can review the database name, username, and password. 

      DatabaseProperty
      Main databasewebapp.db.name
       webapp.db.username
       webapp.db.password
      Jobs databasebatch-job-runner.db.name
       batch-job-runner.db.username
       batch-job-runner.db.password
      Scheduling databasescheduling-service.database.name
       scheduling-service.database.user
       scheduling-service.database.password
      Time-Based Trigger databasetime-based-trigger-service.database.name
       time-based-trigger-service.database.user
       time-based-trigger-service.database.password



    3. Make changes in the file as needed and save.
  3. Locate the sample Postgres configuration file:

    /opt/trifacta/bin/setup-utils/db/pg_hba.conf.SAMPLE
  4. If you are upgrading and have customizations in your existing version, you must apply the edits in the above to the following file. Otherwise, overwrite the following file with the above one based on your operating system:
    1. CentOS/RHEL dir: /var/lib/pgsql/9.3/data/pg_hba.conf 
    2. Ubuntu dir: /etc/postgresql/9.3/main/pg_hba.conf 

  5. From the SAMPLE file, copy the following declarations and paste them into the production pg_hba.conf file above any other declarations:

    NOTE: You can substitute different database usernames and groups for the ones listed below (trifacta and trifacta). These values may be needed for other configuration.

     

    1. Trifacta database:

      local   trifacta         trifacta                               md5
      host    trifacta         trifacta         127.0.0.1/32          md5
      host    trifacta         trifacta         ::1/128               md5
    2. Jobs database:

      local   trifacta-activiti         trifactaactiviti                               md5
      host    trifacta-activiti         trifactaactiviti         127.0.0.1/32          md5
      host    trifacta-activiti         trifactaactiviti         ::1/128               md5



    3. Scheduling database: 

      local   trifactaschedulingservice         trifactaschedulingservice                               md5
      host    trifactaschedulingservice         trifactaschedulingservice         127.0.0.1/32          md5
      host    trifactaschedulingservice         trifactaschedulingservice         ::1/128               md5

      For more information on scheduling, see Configure Scheduling

    4. Time-based Trigger database:

      local   trifactatimebasedtriggerservice         trifactatimebasedtriggerservice                               md5
      host    trifactatimebasedtriggerservice         trifactatimebasedtriggerservice         127.0.0.1/32          md5
      host    trifactatimebasedtriggerservice         trifactatimebasedtriggerservice         ::1/128               md5

      For more information on scheduling, see Configure Scheduling.


    1. Save the file.
  6. Restart the databases:

    1. If you are have also restarted the operating system, please execute the following first, followed by the O/S-specific commands:

      NOTE: This command is valid only if the Postgres DB is also hosted in the Trifacta node.


      chkconfig postgresql-9.3 on

       

    2. CentOS/RHEL:

      sudo service postgresql-9.3 start
    3. Ubuntu:

      sudo service postgresql start
  7. Run the following script, which builds the four databases and specifies the appropriate roles for each database, based on the parameters you have specified in 

    trifacta-conf.json
     and in the pg_hba.conf:

    NOTE: This script must be run as the root user or via sudo superuser.

    /opt/trifacta/bin/setup-utils/db/trifacta-create-postgres-roles-dbs.sh

Backup

For more information on backup recommendations and commands, see Backup and Recovery.

Configure through Admin Settings

If you have installed or upgraded the software on the Trifacta node and verified that the software is connected to the database, you can begin using the Admin Settings page in the web application for further configuration. 

NOTE: If a setting is not available in the Admin Settings page, it must be modified through

trifacta-conf.json
.

Do not modify settings through the Admin Settings page and through

trifacta-conf.json
at the same time. Saving changes in one interface wipes out any unsaved changes in the other interface. Each requires a platform restart to apply the changes.

Steps:

  1. Start the platform. See Start and Stop the Platform.
  2. Login to the application with an administrator account. See Login.
  3. In the application menu, select Settings menu > Admin Settings

For more information, see Admin Settings Page.

Configure non-default connections

If you have used non-default values for the username, password, host, or port value for either database, you must update platform configuration. You can apply this change through the Admin Settings Page (recommended) or

trifacta-conf.json
. For more information, see Platform Configuration Methods

NOTE: Do not modify the other properties in these sections unless necessary.

Trifacta database

"webapp.db.username": "trifacta",
"webapp.db.logging": false,
"webapp.db.name": "trifacta",
"webapp.db.host": "localhost",
"webapp.db.password": "<pwd_trifactaDB>",
"webapp.db.type": "postgressql",
"webapp.db.port": 5432,
"webapp.db.pool.maxIdleTimeInMillis": 30000,
"webapp.db.pool.maxConnections": 10,

The following parameters apply to the Trifacta database only:

ParameterDescription
logging

Set this value to true to enable logging on the Trifacta database.

pool.maxIdleTimeInMillisSpecifies the maximum permitted idle time for a database connection before it is automatically closed.
pool.maxConnections

Defines the maximum permitted database connections for the Trifacta database.

Additional parameters are described below.

Jobs database

Modify the batch-job-runner.db settings:

"batch-job-runner.db.username": "trifactaactiviti", 
"batch-job-runner.db.name": "trifacta-activiti", 
"batch-job-runner.db.driver": "org.postgresql.Driver", 
"batch-job-runner.db.host": "localhost", 
"batch-job-runner.db.password": "<pwd_trifactaactivitiDB>", 
"batch-job-runner.db.port": 5432,

Jobs database thread pool size

You can modify the following settings to specify minimum and maximum permitted thread pools for the Jobs database:

"batch-job-runner.db.minPoolSize": 3,
"batch-job-runner.db.initialPoolSize": 3,
"batch-job-runner.db.maxPoolSize": 50,
Parameter NameDescription
batch-job-runner.db.minPoolSize
Integer representing the minimum size of the database connection pool
batch-job-runner.db.initialPoolSize
Integer representing the initial size of the database connection pool
batch-job-runner.db.maxPoolSize
Integer representing the maximum size of the database connection pool

Scheduling service database 

"scheduling-service.database.type": "POSTGRESQL",
"scheduling-service.database.host": "localhost",
"scheduling-service.database.port": "5432",
"scheduling-service.database.name": "trifactaschedulingservice",
"scheduling-service.database.user": "trifactaschedulingservice",
"scheduling-service.database.password": "<pwd_schedulingserviceDB>" 

Time-based trigger service database 

"time-based-trigger-service.database.type": "POSTGRESQL",
"time-based-trigger-service.database.host": "localhost",
"time-based-trigger-service.database.port": "5432",
"time-based-trigger-service.database.name": "trifactatimebasedtriggerservice",
"time-based-trigger-service.database.user": "trifactatimebasedtriggerservice",
"time-based-trigger-service.database.password": "<pwd_triggerserviceDB>"

Database Parameter Reference

The following generalized parameters apply to one or more of the databases. 

ParameterDescription
host

Host of the database. Default value is localhost, meaning the database is hosted on the Trifacta node.

portPort number for the database. Default value is 5432 for all databases.
nameName of the database. This value should match what was used during installation.
user or usernameThe username to use to connect to the database.
passwordPassword to use to connect to the database.
typeThis value should be set to POSTGRESQL. Do not modify.
driverName of the database. Do not modify.

Save your changes and restart the platform.

This page has no comments.