Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • This feature is supported for 
    D s product
    productee
     only.
  • The 
    D s platform
     must be deployed on an AWS EC2 Instance that is joined to the same domain as the EMR cluster.
  • The EMR cluster must be kerberized using the Cross-Realm Trust method. Additional information is below.

 




Please apply the following configuration to set the EMR authentication mode:

Steps:

  • Excerpt

    Create EMR Cluster

    Use the following section to set up your EMR cluster for use with the 

    D s platform
    .

    • Via AWS EMR UI: This method is assumed in this documentation.
    • Via AWS command line interface: For this method, it is assumed that you know the required steps to perform the basic configuration. For custom configuration steps, additional documentation is provided below.


    Info

    NOTE: If you are integrating with a kerberized EMR cluster, the cluster must be kerberized using the Cross-Realm Trust method. The KDC on the EMR cluster must establish a cross-realm trust with the external KDC. No other Kerberos method is supported.

    For more information, see https://docs.amazonaws.cn/en_us/emr/latest/ManagementGuide/emr-kerberos-options.html.



    Info

    NOTE: It is recommended that you set up your cluster for exclusive use by the

    D s platform
    .

    Cluster options

    In the Amazon EMR console, click Create Cluster. Click Go to advanced options. Complete the sections listed below.

    Info

    NOTE: Please be sure to read all of the cluster options before setting up your EMR cluster.


    Info

    NOTE: Please perform your configuration through the Advanced Options workflow.

    For more information on setting up your EMR cluster, see http://docs.aws.amazon.com/cli/latest/reference/emr/create-cluster.html.

    Advanced Options

    In the Advanced Options screen, please select the following:

    • Software Configuration:

      • Release: EMR version to select.

      • Select:
        • Hadoop 2.8.3
        • Hue 3.12.0
        • Ganglia 3.7.2

          Tip

          Tip: Although it is optional, Ganglia is recommended for monitoring cluster performance.


        • Spark version should be set accordingly. See "Supported Spark Versions" above.

      • Deselect everything else.
    • Edit the software settings:
      • Copy and paste the following into Enter Configuration:

        Code Block
        [
         {
         "Classification": "capacity-scheduler",
         "Properties": {
         "yarn.scheduler.capacity.resource-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
         }
         }
        ]


    • Auto-terminate cluster after the last step is completed: Leave this option disabled.

    Hardware configuration

    Info

    NOTE: Please apply the sizing information for your EMR cluster that was recommended for you. If you have not done so, please contact your

    D s item
    itemrepresentative
    .

    General Options

    • Cluster name: Provide a descriptive name.
    • Logging: Enable logging on the cluster. 
      • S3 folder: Please specify the S3 bucket and path to the logging folder.

        Info

        NOTE: Please verify that this location is read accessible to all users of the platform. See below for details.


    • Debugging: Enable.
    • Termination protection: Enable.
    • Tags:
      • No options required.
    • Additional Options:
      • EMRFS consistent view: Do not enable. The platform can generate its own job output manifests. See Enable S3 Access.
      • Custom AMI ID: None.
      • Bootstrap Actions:
        • If you are using a custom credential provider JAR, you must create a bootstrap action. 

          Info

          NOTE: This configuration must be completed before you create the EMR cluster. For more information, see Authentication below.


    Security Options

    • EC2 key pair: Please select a key/pair to use if you wish to access EMR nodes via SSH. 
    • Permissions: Set to Custom to reduce the scope of permissions. For more information, see EMR cluster policies below.

      Info

      NOTE: Default permissions give access to everything in the cluster.


    • Encryption Options
      • No requirements.
    • EC2 Security Groups:

      • The selected security group for the master node on the cluster must allow TCP traffic from the
        D s item
        iteminstance
        on port 8088. For more information, see System Ports.

    Create cluster and acquire cluster ID

    If you performed all of the configuration, including the sections below, you can create the cluster.

    Info

    NOTE: You must acquire your EMR cluster ID for use in configuration of the

    D s platform
    .

    Specify cluster roles

    The following cluster roles and their permissions are required. For more information on the specifics of these policies, see EMR cluster policies

    • EMR Role: 
      • Read/write access to log bucket
      • Read access to resource bucket
    • EC2 instance profile:
      • If using instance mode: 
        • EC2 profile should have read/write access for all users. 
        • EC2 profile should have same permissions as EC2 Edge node role. 
      • Read/write access to log bucket
      • Read access to resource bucket
    • Auto-scaling role:
      • Read/write access to log bucket
      • Read access to resource bucket
      • Standard auto-scaling permissions

    Authentication

    You can use one of two methods for authenticating the EMR cluster:

    • Role-based IAM authentication (recommended): This method leverages your IAM roles on the EC2 instance. 
    • Custom credential provider JAR file: This method utilizes a JAR file provided with the platform. This JAR file must be deployed to all nodes on the EMR cluster through a bootstrap action script.

    Role-based IAM authentication

    You can leverage your IAM roles to provide role-based authentication to the S3 buckets.

    Info

    NOTE: The IAM role that is assigned to the EMR cluster and to the EC2 instances on the cluster must have access to the data of all users on S3.


    For more information, see Configure for EC2 Role-Based Authentication.

    Specify the custom credential provider JAR file

    If you are not using IAM roles for access, you can manage access using either of the following:

    • AWS key and secret values specified in 
      D s triconf
    • AWS user mode

    In either scenario, you must use the custom credential provider JAR provided in the installation. This JAR file must be available to all nodes of the EMR cluster.

    After you have installed the platform and configured the S3 buckets, please complete the following steps to deploy this JAR file.

    Info

    NOTE: These steps must be completed before you create the EMR cluster.


    Info

    NOTE: This section applies if you are using the default credential provider mechanism for AWS and are not using the IAM instance-based role authentication mechanism.

     


    Steps:

    1. From the installation of the 

      D s platform
      , retrieve the following file:

      Code Block
      [TRIFACTA_INSTALL_DIR]/aws/credential-provider/build/libs/trifacta-aws-emr-credential-provider.jar


    2. Upload this JAR file to an S3 bucket location where the EMR cluster can access it:

      1. Via AWS Console S3 UI: See http://docs.aws.amazon.com/cli/latest/reference/s3/index.html.
      2. Via AWS command line:

        Code Block
        aws s3 cp trifacta-aws-emr-credential-provider.jar s3://<YOUR-BUCKET>/


    3. Create a bootstrap action script named configure_emrfs_lib.sh. The contents must be the following:

      Code Block
      sudo aws s3 cp s3://<YOUR-BUCKET>/trifacta-aws-emr-credential-provider.jar  /usr/share/aws/emr/emrfs/auxlib/


    4. This script must be uploaded into S3 in a location that can be accessed from the EMR cluster. Retain the full path to this location.
    5. Add bootstrap action to EMR cluster configuration.
      1. Via AWS Console S3 UI: Create the bootstrap action to point to the script you uploaded on S3.

         


      2. Via AWS command line: 
        1. Upload the configure_emrfs_lib.sh file to the accessible S3 bucket.
        2. In the command line cluster creation script, add a custom bootstrap action, such as the following:

          Code Block
          --bootstrap-actions '[
          {"Path":"s3://<YOUR-BUCKET>/configure_emrfs_lib.sh","Name":"Custom action"}
          ]'


    When the EMR cluster is launched with the above custom bootstrap action, the cluster does one of the following:

    • Interacts with S3 using the credentials specified in
      D s triconf
    • if aws.mode = user, then the credentials registered by the user are used.

    For more information about AWSCredentialsProvider for EMRFS please see:

    Set up S3 Buckets

    Bucket setup

    You must set up S3 buckets for read and write access. 

    Info

    NOTE: Within the

    D s platform
    , you must enable use of S3 as the default storage layer. This configuration is described later.

    For more information, see Enable S3 Access.

    Set up EMR resources buckets


    Info

    NOTE: If you are connecting to a kerberized EMR cluster, please skip to the next section. This section is not required.

    On the EMR cluster, all users of the platform must have access to the following locations:


    LocationDescriptionRequired Access
    EMR Resources bucket and path

    The S3 bucket and path where resources can be stored by the

    D s platform
    for execution of Spark jobs on the cluster.

    The locations are configured separately in the

    D s platform
    .

    Read/Write
    EMR Logs bucket and path

    The S3 bucket and path where logs are written for cluster job execution.  

    Read

    These locations are configured on the 

    D s platform
     later.

    Access Policies

    EC2 instance profile

    D s item
    itemusers
     require the following policies to run jobs on the EMR cluster:

    Code Block
    {
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "elasticmapreduce:AddJobFlowSteps",
                    "elasticmapreduce:DescribeStep",
                    "elasticmapreduce:DescribeCluster",
                    "elasticmapreduce:ListInstanceGroups"
                ],
                "Resource": [
                    "*"
                ]
            },
           {
                "Effect": "Allow",
                "Action": [
                    "s3:*"
                ],
                "Resource": [
                    "arn:aws:s3:::__EMR_LOG_BUCKET__",
                    "arn:aws:s3:::__EMR_LOG_BUCKET__/*",
                    "arn:aws:s3:::__EMR_RESOURCE_BUCKET__",
                    "arn:aws:s3:::__EMR_RESOURCE_BUCKET__/*"
                ]
            }
    
        ]
    }

    EMR roles

    The following policies should be assigned to the EMR roles listed below for read/write access:

    Code Block
    {
                "Effect": "Allow",
                "Action": [
                    "s3:*"
                ],
                "Resource": [
                    "arn:aws:s3:::__EMR_LOG_BUCKET__",
                    "arn:aws:s3:::__EMR_LOG_BUCKET__/*",
                    "arn:aws:s3:::__EMR_RESOURCE_BUCKET__",
                    "arn:aws:s3:::__EMR_RESOURCE_BUCKET__/*"
                ]
            }
    }

    General configuration for 
    D s platform

    Please complete the following sections to configure the 

    D s platform
     to communicate with the EMR cluster.

    Change admin password

    As soon as you have installed the software, you should login to the application and change the admin password. The initial admin password is the instanceId for the EC2 instance. For more information, see Change Password.

    Verify S3 as base storage layer

    EMR integrations requires use of S3 as the base storage layer.

    Info

    NOTE: The base storage layer must be set during initial installation and set up of the

    D s node
    .

    See Set Base Storage Layer.

    Set up S3 integration

    To integrate with S3, additional configuration is required. See Enable S3 Access.

    Configure EMR authentication mode

    Authentication to AWS and to EMR supports the following basic modes:

    • System: A single set of credentials is used to connect to resources. 
    • User: Each user has a separate set of credentials. The user can choose to submit key-secret combinations or role-based authentication.
    Info

    NOTE: Your method of authentication to AWS should already be configured. For more information, see Configure for AWS.

    The authentication mode for your access to EMR can be configured independently from the base authentication mode for AWS, with the following exception:

    Info

    NOTE: If aws.emr.authMode is set to user, then aws.mode must also be set to user.

    Authentication mode configuration matrix:

    AWS mode
    (aws.mode)
    systemuser
    EMR mode
    (aws.emr.authMode)
    system

    AWS and EMR use a single key-secret combination. Parameters to set:

    Code Block
    "aws.s3.key"
    "aws.s3.secret"

    See Configure for AWS.

    AWS access uses a single key-secret combination.

    EMR access is governed by per-user credentials. Per-user credentials can be provided from one of several different providers.

    Info

    NOTE: Per-user access requires additional configuration for EMR. See the following section.

    For more information on configuring per-user access, see Configure for AWS.

    userNot supported

    AWS and EMR use the same per-user credentials for access. Per-user credentials can be provided from one of several different providers.

    Info

    NOTE: Per-user access requires additional configuration for EMR. See the following section.

    For more information on configuring per-user access, see Configure AWS Per-User Authentication.

    D s config
    Locate the following settings and apply the appropriate values. See the table below:
    Code Block
    "aws.emr.authMode":  "user",
    SettingDescriptionaws.emr.authMode

    Configure the mode to use to authenticate to the EMR cluster:

    system - In system mode, the specified AWS key and secret combination are used to authenticate to the EMR cluster. These credentials are used for all users.

    user - In user mode, user configuration is retrieved from the database.

    Info

    NOTE: User mode for EMR authentication requires that aws.mode be set to user. Additional configuration for EMR is below.

  • Save your changes.
  • EMR per-user authentication for the
    D s platform

    If you have enabled per-user authentication for EMR (aws.emr.authMode=user), you must set the following properties based on the credential provider for your AWS per-user credentials.

    D s config

    Authentication methodProperties and values

    Use default credential provider for all

    D s item
    itemaccess
    including EMR.

    InfoNOTE: This method requires the deployment of

    EMR Authentication for the 
    D s platform

    Depending on the authentication method you used, you must set the following properties.

    D s config

    Authentication methodProperties and values

    Use default credential provider for all

    D s item
    itemaccess
    including EMR.

    Info

    NOTE: This method requires the deployment of a custom credential provider JAR.



    Code Block
    "aws.credentialProvider":"default",
    "aws.emr.forceInstanceRole":false,


    Use default credential provider for all

    D s item
    itemaccess
    . However, EC2 role-based IAM authentication is used for EMR.


    Code Block
    "aws.credentialProvider":"default",
    "aws.emr.forceInstanceRole":true,


    EC2 role-based IAM authentication for all

    D s item
    itemaccess


    Code Block
    "aws.credentialProvider":"instance",


    Configure
    D s platform
    for EMR


    Info

    NOTE: This section assumes that you are integrating with an EMR cluster that has not been kerberized. If you are integrating with a Kerberized cluster, please skip to "Configure for EMR with Kerberos".


    Enable EMR integration

    After you have configured S3 to be the base storage layer, you must enable EMR integration.

    Steps:

    D s config

    1. Set the following value:

      Code Block
      "webapp.runInEMR": true,


    2. Set the following values:

      Code Block
      "webapp.runWithSparkSubmit": false,


    3. Verify the following property values:

      Code Block
      "webapp.runInTrifactaServer": true,
      "webapp.runWithSparkSubmit": false,
      "webapp.runInDataflow": false,


    Apply EMR cluster ID

    The 

    D s platform
     must be aware of the EMR cluster to which to connection. 

    Steps:

    1. D s config
      methodadmin

    2. Under External Service Settings, enter your AWS EMR Cluster ID. Click the Save button below the textbox.

    For more information, see Admin Settings Page.

    Extract IP address of master node in private sub-net

    If you have deployed your EMR cluster on a private sub-net that is accessible outside of AWS, you must enable this property, which permits the extraction of the IP address of the master cluster node through DNS.

    Info

    NOTE: This feature must be enabled if your EMR is accessible outside of AWS on a private network.

    Steps:

    1. D s config
    2. Set the following property to true:

      Code Block
      "emr.extractIPFromDNS": false,


    3. Save your changes and restart the platform.

    Configure authentication mode

    You can authenticate to the EMR cluster using either of the following authenticate modes:

    • System: A single set of credentials are used to connect to EMR. 
    • User: Each user has a separate set of credentials.

    Steps:

    1. D s config
    2. Locate the following settings and apply the appropriate values. See the table below:

      Code Block
      "aws.emr.authMode":  "user",


      SettingDescription
      aws.emr.authMode

      Configure the mode to use to authenticate to the EMR cluster:

      system - In system mode, the specified AWS key and secret combination are used to authenticate to the EMR cluster. These credentials are used for all users.

      user - In user mode, user configuration is retrieved from the database.

      Info

      NOTE: User mode for EMR authentication requires that aws.mode be set to user.



    3. Save your changes.

    Configure Spark for EMR

    For EMR, you can configure a set of Spark-related properties to manage the integration and its performance.

    Configure Spark version

    Depending on the version of EMR with which you are integrating, the

    D s platform
    must be modified to use the appropriate version of Spark to connect to EMR.

    Info

    NOTE: You should have already acquired the value to apply. See "Supported Spark Versions" above.

    Steps:

    1. D s config
    2. Locate the following:

      Code Block
      "spark.version": "<SparkVersionForMyEMRVersion>",


    3. Save your changes.

    Use vendor libraries

    If you are using EMR 5.20 or later (Spark 2.4 or later), you must configure the vendor libraries provided by the cluster. Please set the following parameter.

    Steps:

    1. D s config
    2. Locate the following:

      Code Block
      "spark.useVendorSparkLibraries": true,


    3. Save your changes.

    Disable Spark job service

    The Spark job service is not used for EMR job execution. Please complete the following to disable it:

    Steps:

    1. D s config
    2. Locate the following and set it to false:

      Code Block
      "spark-job-service.enabled": false,


    3. Locate the following and set it to false:

      Code Block
      "spark-job-service.enableHiveSupport": false,


    4. Save your changes.

    Specify YARN queue for Spark jobs

    Through the Admin Settings page, you can specify the YARN queue to which to submit your Spark jobs. All Spark jobs from the 

    D s platform
     are submitted to this queue.

    Steps:

    1. In platform configuration, locate the following:

      Code Block
      "spark.props.spark.yarn.queue"


    2. Specify the name of the queue. 
    3. Save your changes.

    Allocation properties

    The following properties must be passed from the 

    D s platform
     to Spark for proper execution on the EMR cluster. 

    D s config
    methodtriconf

    Info

    NOTE: Do not modify these properties through the Admin Settings page. These properties must be added as extra properties through the Spark configuration block. Ignore any references in

    D s triconf
    to these properties and their settings.


    Code Block
    "spark": { 
      ...
      "props": { 
        "spark.dynamicAllocation.enabled": "true",
        "spark.shuffle.service.enabled": "true", 
        "spark.executor.instances": "0", 
        "spark.executor.memory": "2048M", 
        "spark.executor.cores": "2",
        "spark.driver.maxResultSize": "0"
      }
      ...
    }


    PropertyDescriptionValue
    spark.dynamicAllocation.enabled
    Enable dynamic allocation on the Spark cluster, which allows Spark to dynamically adjust the number of executors.true
    spark.shuffle.service.enabled
    Enable Spark shuffle service, which manages the shuffle data for jobs, instead of the executors.true
    spark.executor.instances
    Default count of executor instances.See Sizing GuideGuidelines.
    spark.executor.memory
    Default memory allocation of executor instances.See Sizing GuideGuidelines.
    spark.executor.cores
    Default count of executor cores.See Sizing GuideGuidelines.
    spark.driver.maxResultSizeEnable serialized results of unlimited size by setting this parameter to zero (0).0


    Configure
    D s platform
    for EMR with Kerberos

    Info

    NOTE: This section applies only if you are integrating with a kerberized EMR cluster. If you are not, please skip to "Additional Configuration for EMR".

    Disable standard EMR integration

    When running jobs against a kerberized EMR cluster, you utilize the Spark-submit method of job submission. You must disable the standard EMR integration.

    Steps:

    D s config

    1. Search for the following setting and set it false:

      Code Block
      "webapp.runInEMR": false,


    2. Set the following value:

      Code Block
      "webapp.runWithSparkSubmit": true,


    3. Disable use of Hive, which is not supported with EMR:

      Code Block
      "spark-job-service.enableHiveSupport": false,


    4. Verify the following property values:

      Code Block
      "webapp.runInTrifactaServer": true,
      "webapp.runInDataflow": false,


    5. Save your changes.

    Enable YARN

    To use Spark-submit, the Spark master must be set to use YARN.

    Steps:

    D s config

    1. Search for the following setting and set it yarn:

      Code Block
      "spark.master": "yarn",


    2. Save your changes.

    Acquire site config files

    For integrating with an EMR cluster with Kerberos, the EMR cluster site XML configuration files must be downloaded from the EMR master node to the

    D s item
    itemnode
    .

    Info

    NOTE: This step is not required for non-Kerberized EMR clusters.


    Info

    NOTE: When these files change, you must update the local copies.

    1. Download the Hadoop Client Configuration files from the EMR master node. The required files are the following:
      1. core-site.xml
      2. hdfs-site.xml
      3. mapred-site.xml
      4. yarn-site.xml
    2. These configuration files must be moved to the 

      D s item
      itemdeployment
      . By default, these files are in /etc/hadoop/conf:

      Code Block
      sudo cp <location>/*.xml /opt/trifacta/conf/hadoop-site/
      sudo chown trifacta:trifacta /opt/trifacta/conf/hadoop-site/*.xml


    3. (Option) If we want to support impersonate, we also need copy *.keytab from the EMR master node under /etc folder to EC2 instance under same folder.

    Unused properties for EMR with Kerberos

    When integrating with a kerberized EMR cluster, the following

    D s item
    itemsettings
    are unused:

    1. External Service Settings: In the Admin Settings page, this section of configuration does not apply to EMR with Kerberos.
    2. Unused EMR settings: In the Admin Settings page, the following EMR settings do not apply to EMR with Kerberos:

      Code Block
      aws.emr.tempfilesCleanupAge
      aws.emr.proxyUser
      aws.emr.maxLogPollingRetries
      aws.emr.jobTagPrefix
      aws.emr.getLogsOnFailure
      aws.emr.getLogsForAllJobs
      aws.emr.extractIPFromDNS
      aws.emr.connectToRmEnabled




    Additional Configuration for EMR

    Default Hadoop job results format

    For smaller datasets, the platform recommends using the 

    D s photon
    running environment.

    For larger datasets, if the size information is unavailable, the platform recommends by default that you run the job on the Hadoop cluster. For these jobs, the default publishing action for the job is specified to run on the Hadoop cluster, generating the output format defined by this parameter. Publishing actions, including output format, can always be changed as part of the job specification. 

    As needed, you can change this default format. 

    D s config

    Code Block
    "webapp.defaultHadoopFileFormat": "csv",

    Accepted values: csvjsonavropqt

    For more information, see Run Job Page.

    Configure Snappy publication

    If you are publishing using Snappy compression for jobs run on an EMR cluster, you may need to perform the following additional configuration.

    Steps:



    1. SSH into EMR cluster (master) node:

      Code Block
      ssh <EMR master node>


    2. Create tarball of native Hadoop libraries:

      Code Block
      tar -C /usr/lib/hadoop/lib -czvf emr-hadoop-native.tar.gz native


    3. Copy the tarball to the

      D s item
      itemEC2
      instance used by the into the /tmp directory:

      Code Block
      scp -p emr-hadoop-native.tar.gz <EC2 instance>:/tmp


    4. SSH to

      D s item
      itemEC2
      instance:

      Code Block
      ssh <EC2 instance>


    5. Create path values for libraries:

      Code Block
      sudo -u trifacta mkdir -p /opt/trifacta/services/batch-job-runner/build/libs


    6. Untar the tarball to the

      D s item
      iteminstallation path
      :

      Code Block
      sudo -u trifacta tar -C /opt/trifacta/services/batch-job-runner/build/libs -xzf /tmp/emr-hadoop-native.tar.gz


    7. Verify libhadoop.so* and libsnappy.so* libraries exist and are owned by the

      D s item
      itemuser
      :

      Code Block
      ls -l /opt/trifacta/services/batch-job-runner/build/libs/native/


    8. Verify that the /tmp directory has the proper permissions for publication. For more information, see Supported File Formats.

    9. A platform restart is not required.

    Additional parameters

    You can set the following parameters as needed:

    Steps:

    D s config

    PropertyRequiredDescription
    aws.emr.resource.bucketY

    S3 bucket name where

    D s item
    itemexecutables
    , libraries, and other resources can be stored that are required for Spark execution.

    aws.emr.resource.pathY

    S3 path within the bucket where resources can be stored for job execution on the EMR cluster.

    Info

    NOTE: Do not include leading or trailing slashes for the path value.


    aws.emr.proxyUserY

    This value defines the user for the

    D s item
    itemusers
    to use for connecting to the cluster.

    Info

    NOTE: Do not modify this value.


    aws.emr.maxLogPollingRetriesNConfigure maximum number of retries when polling for log files from EMR after job success or failure. Minimum value is 5.
    aws.emr.tempfilesCleanupAgeN

    Defines the number of days that temporary files in the /trifacta/tempfiles directory on EMR HDFS are permitted to age.

    By default, this value is set to 0, which means that cleanup is disabled.

    If needed, you can set this to a positive integer value. During each job run, the platform scans this directory for temp files older than the specified number of days and removes any that are found. This cleanup provides an additional level of system hygiene.

    Before enabling this secondary cleanup process, please execute the following command to clear the tempfiles directory:

    Code Block
    hdfs dfs -rm -r -skipTrash /trifacta/tempfiles


    Optional Configuration

    Configure for Redshift

    For more information on configuring the platform to integrate with Redshift, see Create Redshift Connections.

    Switch EMR Cluster

    If needed, you can switch to a different EMR cluster through the application. For example, if the original cluster suffers a prolonged outage, you can switch clusters by entering the cluster ID of a new cluster. For more information, see Admin Settings Page.

    Configure Batch Job Runner

    Batch Job Runner manages jobs executed on the EMR cluster. You can modify aspects of how jobs are executed and how logs are collected. For more information, see Configure Batch Job Runner.

    Modify Job Tag Prefix

    In environments where the EMR cluster is shared with other job-executing applications, you can review and specify the job tag prefix, which is prepended to job identifiers to avoid conflicts with other applications.

    Steps:

    1. D s config
    2. Locate the following and modify if needed:

      Code Block
      "aws.emr.jobTagPrefix": "TRIFACTA_JOB_",


    3. Save your changes and restart the platform.

    ...