The can be hosted within Amazon and supports integrations with multiple services from Amazon Web Services, including combinations of services for hybrid deployments. This section provides an overview of the integration options, as well as links to related configuration topics.
For an overview of AWS deployment scenarios, see Supported Deployment Scenarios for AWS.
From AWS, the
|
The following database scenarios are supported.
Database Host | Description |
---|---|
Cluster node | By default, the |
Amazon RDS | For Amazon-based installations, you can install the |
The following configuration topics apply to AWS in general.
NOTE: The base storage layer must be set during initial configuration and cannot be modified after it is set. |
S3: Most of these integrations require use of S3 as the base storage layer, which means that data uploads, default location of writing results, and sample generation all occur on S3. When base storage layer is set to S3, the can:
HDFS: In on-premises installations, it is possible to use S3 as a read-only option for a Hadoop-based cluster when the base storage layer is HDFS. You can configure the platform to read from and write to S3 buckets during job execution and sampling. For more information, see S3 Access.
For more information on setting the base storage layer, see Set Base Storage Layer.
For more information, see Storage Deployment Options.
For Amazon integrations, you can configure the to connect to Amazon datastores located in different regions.
NOTE: This configuration is required under any of the following deployment conditions:
|
Set the value of the following property to the region where your S3 datastores are located:
aws.region |
If the above value is not set, then the attempts to infer the region based on default S3 bucket location.
Save your changes.
If your instance of the is deployed in the AWS GovCloud, additional configuration is required.
NOTE: GovCloud authentication is completely isolated from Amazon.com. Key-secret combinations for AWS do not apply to AWS GovCloud. |
For more information, see https://docs.aws.amazon.com/govcloud-us/latest/UserGuide/govcloud-differences.html.
NOTE: In AWS GovCloud, the AWS S3 endpoint must be configured in the private subnet or VPC to be able to make outbound requests to S3 resources. |
You must configure the region and endpoint settings to communicate with GovCloud S3 resources.
Steps:
In the Admin Settings page, specify the following parameters as follows:
Parameter | Description |
---|---|
aws.region | Specify the AWS GovCloud region where your S3 resources are located. |
aws.s3.endpoint | Specify the private subnet or VPC endpoint through which the |
Save your changes.
Edit the following file:
/opt/trifacta/conf/env.sh |
Add the following line:
AWS_REGION=<aws.region> |
where:<aws.region>
is the value you inserted for the AWS GovCloud region in the Admin Settings page.
Accessing AWS secret region using an emulation platform:
To the env.sh
file, please add the following line:
NODE_EXTRA_CA_CERTS=/home/trifacta/ca-chain.cert.pem |
NOTE: The above certificate must be a valid TLS/SSL certificate for the emulation platform. |
For AWS requests from Java services, you must add the custom CA certificate to the CA certificates for the system. For more information, please see the documentation for your emulation platform.
Save your changes.
For more information, see Configure for AWS Authentication.
To integrate with S3, additional configuration is required. See S3 Access.
Users can create secondary connections to specific S3 buckets. For more information, see External S3 Connections.
You can create connections to one or more Redshift databases, from which you can read database sources and to which you can write job results. Samples are still generated on S3.
For more information, see Amazon Redshift Connections.
Through your AWS deployment, you can access your Snowflake databases. For more information, see Snowflake Connections.
can integrate with one instance of either of the following.
NOTE: If |
When in installed through AWS, you can integrate with an EMR cluster for Spark-based job execution. For more information, see Configure for EMR.
If you have installed on-premises or directly into an EC2 instance, you can integrate with a Hadoop cluster for Spark-based job execution. See Configure for Hadoop.