Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
{
    "Sid" : "accessToAllTables",
    "Effect" : "Allow",
    "Principal" : {
      "AWS" : [  "arn:aws:iam::<accountId>:role/glue-read-all" ]
    },
    "Action" : [ "glue:GetDatabases", "glue:GetDatabase", "glue:GetTables", "glue:GetTable", "glue:GetUserDefinedFunctions", "glue:GetPartitions" ],
    "Resource" : [ "arn:aws:glue:us-west-2:<accountId>:catalog", "arn:aws:glue:us-west-2:<accountId>:database/default", "arn:aws:glue:us-west-2:<accountId>:database/global_temp", "arn:aws:glue:us-west-2:<accountId>:database/mydb", "arn:aws:glue:us-west-2:<accountId>:table/mydb/*" ]
}

S3 access

AWS Glue crawls available data that is stored on S3. When you import a dataset through AWS Glue:

  • Any samples of your data that are generated by the 
    D s platform
     are stored in S3. Sample data is read by the platform directly from S3.
  • Source data is read through AWS Glue. 
Warning

You should review and, if needed, apply additional read restrictions on your IAM policies so that users are limited to reading data from their own S3 directories. If all users have access to the same areas of the same S3 bucket, then it may be possible for users to access datasets through the platform when it is forbidden through AWS Glue.

Limitations

  • Access is read-only. Publishing to Glue hosted on EMR is not supported.

  • When using per-user IAM role-based authentication, EMR Spark jobs on AWS Glue datasources may fail if the job is still running beyond the defined session limit after job submission time for the IAM role. 
    • In the AWS Console, this limit is defined in hours as the Maximum CLI/API session duration assigned to the IAM role. 
    • In the AWS Glue catalog client for the Hive Metadata store, the temporary credentials generated for the IAM role expire after this limit in hours and cannot be renewed.

...