The  can be configured to integrate with fully compressed Hadoop clusters. The following cluster compression methods are supported:

Supported running environments: 

For more information, see Running Environment Options.

Hadoop clusters can be configured to enable compression of intermediate and/or final output data by default. The settings that are usually used to do so can be found in mapred-site.xml and core-site.xml

Pre-requisites

NOTE: If you have not done so already, you must retrieve cluster configuration files and store them on the . For more information, see Configure for Hadoop.

Enable integration with compression

Steps:

  1. Edit the local version of mapred-site.xml. This file is typically located in /etc/conf/hadoop
  2. Add the following properties:

    <configuration>
      ...
      <property>
        <name>mapreduce.map.output.compress</name>
        <value>true</value>
      </property>
    
      <property>
        <name>mapreduce.map.output.compress.codec</name>
        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
      </property>
    
      <property>
        <name>mapreduce.output.fileoutputformat.compress</name>
        <value>true</value>
      </property>
    
      <property>
        <name>mapreduce.output.fileoutputformat.compress.type</name>
        <value>BLOCK</value>
      </property>
    
      <property>
        <name>mapreduce.output.fileoutputformat.compress.codec</name>
        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
      </property>
      ...
    </configuration>
  3. Save the file and complete the following steps.

Specify codecs

One or more compression/decompression methods (codecs) must be specified in core-site.xml

Steps:

  1. Edit the local version of mapred-site.xml. This file is typically located in /etc/conf/hadoop
  2. Specify the codecs to use in the io.compression.codecs property. Supported values:

    CodeValue
    Gzip
    org.apache.hadoop.io.compress.GzipCodec
    Bzip2
    org.apache.hadoop.io.compress.BZip2Codec
    Snappy
    org.apache.hadoop.io.compress.SnappyCodec
  3. In the following example, all three codecs have been specified:

    <configuration>
      ...
      <property>
        <name>io.compression.codecs</name>
       <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec</value>
      </property>
      ...
    </configuration>
  4.  Save the file.

Configure platform

Apply the following changes from within the application to enable the  to communicate with the compressed cluster.

Steps:

  1. Login to the application. 
  2. In the Admin Settings page, set the following settings:

    SettingDescription
    hadoopDefaultClusterCompression.enabledTo enable integration with a compressed cluster, set this value to true.
    hadoopDefaultClusterCompression.compression

    Set this value to the type of compression applied on the cluster:

    none - (default) no cluster compression

    gzip

    bzip2

    snappy

  3. Save your changes and restart the platform.