Page tree

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

   

Contents:


The Trifacta® platform can be configured to integrate with fully compressed Hadoop clusters. The following cluster compression methods are supported:

  • Gzip
  • Bzip2
  • Snappy

Supported running environments: 

  • Photon
  • Spark

For more information, see Running Environment Options.

Hadoop clusters can be configured to enable compression of intermediate and/or final output data by default. The settings that are usually used to do so can be found in mapred-site.xml and core-site.xml

Pre-requisites

NOTE: If you have not done so already, you must retrieve cluster configuration files and store them on the Trifacta node. For more information, see Configure for Hadoop.

Enable integration with compression

Steps:

  1. Edit the local version of mapred-site.xml. This file is typically located in /etc/conf/hadoop
  2. Add the following properties:

    <configuration>
      ...
      <property>
        <name>mapreduce.map.output.compress</name>
        <value>true</value>
      </property>
    
      <property>
        <name>mapreduce.map.output.compress.codec</name>
        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
      </property>
    
      <property>
        <name>mapreduce.output.fileoutputformat.compress</name>
        <value>true</value>
      </property>
    
      <property>
        <name>mapreduce.output.fileoutputformat.compress.type</name>
        <value>BLOCK</value>
      </property>
    
      <property>
        <name>mapreduce.output.fileoutputformat.compress.codec</name>
        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
      </property>
      ...
    </configuration>
  3. Save the file and complete the following steps.

Specify codecs

One or more compression/decompression methods (codecs) must be specified in core-site.xml

Steps:

  1. Edit the local version of mapred-site.xml. This file is typically located in /etc/conf/hadoop
  2. Specify the codecs to use in the io.compression.codecs property. Supported values:

    CodeValue
    Gzip
    org.apache.hadoop.io.compress.GzipCodec
    Bzip2
    org.apache.hadoop.io.compress.BZip2Codec
    Snappy
    org.apache.hadoop.io.compress.SnappyCodec
  3. In the following example, all three codecs have been specified:

    <configuration>
      ...
      <property>
        <name>io.compression.codecs</name>
       <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec</value>
      </property>
      ...
    </configuration>
  4.  Save the file and restart the platform. See Start and Stop the Platform.

 

  • No labels

This page has no comments.