Contents:
The Designer Cloud Powered by Trifacta® platform can be configured to integrate with fully compressed Hadoop clusters. The following cluster compression methods are supported:
- Gzip
- Bzip2
- Snappy
Supported running environments:
- Photon
Spark
For more information, see Running Environment Options.
Hadoop clusters can be configured to enable compression of intermediate and/or final output data by default. The settings that are usually used to do so can be found in mapred-site.xml
and core-site.xml
.
Pre-requisites
NOTE: If you have not done so already, you must retrieve cluster configuration files and store them on the Alteryx node. For more information, see Configure for Hadoop.
Enable integration with compression
Steps:
- Edit the local version of
mapred-site.xml
. This file is typically located in/etc/conf/hadoop
. Add the following properties:
<configuration> ... <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress</name> <value>true</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>BLOCK</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> ... </configuration>
- Save the file and complete the following steps.
Specify codecs
One or more compression/decompression methods (codecs) must be specified in core-site.xml
.
Steps:
- Edit the local version of
mapred-site.xml
. This file is typically located in/etc/conf/hadoop
. Specify the codecs to use in the
io.compression.codecs
property. Supported values:Code Value Gzip org.apache.hadoop.io.compress.GzipCodec
Bzip2 org.apache.hadoop.io.compress.BZip2Codec
Snappy org.apache.hadoop.io.compress.SnappyCodec
In the following example, all three codecs have been specified:
<configuration> ... <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec</value> </property> ... </configuration>
Save the file.
Configure platform
Apply the following changes from within the application to enable the Designer Cloud Powered by Trifacta platform to communicate with the compressed cluster.
Steps:
- Login to the application.
In the Admin Settings page, set the following settings:
Setting Description hadoopDefaultClusterCompression.enabled
To enable integration with a compressed cluster, set this value to true
.hadoopDefaultClusterCompression.compression
Set this value to the type of compression applied on the cluster:
none
- (default) no cluster compressiongzip
bzip2
snappy
- Save your changes and restart the platform.
This page has no comments.