Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Hadoop cluster: The Hadoop cluster should already be installed and operational. As part of the install preparation, you should have prepared the Hadoop platform for integration with the 
    D s platform
    . See Prepare Hadoop for Integration with the Platform
    1. For more information on the components supported in your Hadoop distribution, See Install Reference.
  2. Storage: on-premises, cloud, or hybrid.
    1. The 
      D s platform
       can interact with storage that is in the local environment, in the cloud, or in some combination. How your storage is deployed affects your configuration scenarios. See Storage Deployment Options.
  3. Base storage layer: You must configure one storage platform to be the base storage layer. Details are described later.

    Info

    NOTE: Some deployments require that you select a specific base storage layer.


    Warning

    After you have defined the base storage layer, it cannot be changed. Please review your Storage Deployment Options carefully. The required configuration is described later. 


Hadoop versions

The 

D s platform
 supports integration only with the versions of Hadoop that are supported for your version of the platform. 

Info

NOTE: The versions of your Hadoop software and the libraries in use by the

D s platform
must match. Unless specifically directed by
D s support
, integration with your Hadoop cluster using a set of Hadoop libraries from a different version of Hadoop is not supported.

For more information, see System Requirements.

Platform configuration

After the

D s platform
 and its databases have been installed, you can perform platform configuration. 
D s config

...

For smaller datasets, the platform recommends using the 

d-s-serverphoton
 running environment.

For larger datasets, if the size information is unavailable, the platform recommends by default that you run the job on the Hadoop cluster. For these jobs, the default publishing action for the job is specified to run on the Hadoop cluster, generating the output format defined by this parameter. Publishing actions, including output format, can always be changed as part of the job specification. 

...

Acquire Hadoop cluster configuration files

Info

NOTE: If the

d-s-item
item
node
has been properly configured as a Hadoop Edge node, these files should already exist on the local node. The location of these files on the Hadoop cluster may vary based on Hadoop distribution, version, and enabled components. For more information, please contact your Hadoop administrator.

...

Info

NOTE: If these configuration files change in the Hadoop cluster, the versions installed on the

d-s-item
item
node
should be updated, or components may fail to work. You may be better served by setting permissions on these files so that they can be read by the
D s defaultuser
Typehadoop
Fulltrue
user and then creating a symlink from the
D s platform
node.

...

If you are using Hortonworks, you must complete the following modification to the site configuration file that is hosted on the 

d-s-item
item
node

Info

NOTE: Before you begin, you must acquire the full version and build number of your Hortonworks distribution. On any of the Hadoop nodes, navigate to /usr/hdp. The version and build number is a directory in this location, named in the following form: A.B.C.D-XXXX.

...

Restart services. See Start and Stop the Platform.

Configure Snappy publication

If you are publishing using Snappy compression, you may need to perform the following additional configuration.

Steps:

  1. Verify that the snappy and snappy-devel packages have been installed on the 

    D s node
    . For more information, see https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/NativeLibraries.html.

  2. From the 

    D s node
    , execute the following command:

    Code Block
    hadoop checknative


  3. The above command identifies where the native libraries are located on the 
    D s node
  4. Cloudera: 
    1. On the cluster, locate the libsnappy.so file. Verify that this file has been installed on all nodes of the cluster, including the 
      D s node
      . Retain the path to the file on the 
      D s node
      .
    2. D s config
    3. Locate the spark.props configuration block. Insert the following properties and values inside the block:

      Code Block
      "spark.driver.extraLibraryPath": "/path/to/file",
      "spark.executor.extraLibraryPath": "/path/to/file",


  5. Hortonworks:
    1. Verify on the 

      D s node
       that the following locations are available:

      Info

      NOTE: The asterisk below is a wildcard. Please collect the entire path of both values.


      Code Block
      /hadoop-client/lib/snappy*.jar
      /hadoop-client/lib/native/


    2. D s config
    3. Locate the spark.props configuration block. Insert the following properties and values inside the block:

      Code Block
      "spark.driver.extraLibraryPath": "/hadoop-client/lib/snappy*.jar;/hadoop-client/lib/native/",
      "spark.executor.extraLibraryPath": "/hadoop-client/lib/snappy*.jar;/hadoop-client/lib/native/",


  6. Save your changes and restart the platform.
  7. Verify that the /tmp directory has the proper permissions for publication. For more information, see Supported File Formats.

Debugging

You can review system services and download log files through the

D s webapp
.

...