Using External Buckets#

Distributed-CellProfiler can read and/or write to/from an external S3 bucket (i.e. a bucket not in the same account as you are running DCP). To do so, you will need to appropriately set your configuration in run.py. You may need additional configuration in AWS Identity and Access Management (IAM).

Config setup#

  • AWS_PROFILE: The profile you use must have appropriate permissions for running DCP as well as read/write permissions for the external bucket. See below for more information.

  • AWS_BUCKET: The bucket to which you would like to write log files. This is generally the bucket in the account in which you are running compute.

  • SOURCE_BUCKET: The bucket where the files you will be reading are. Often, this is the same as AWS_BUCKET.

  • DESTINATION_BUCKET: The bucket where you want to write your output files. Often, this is the same as AWS_BUCKET.

  • UPLOAD_FLAGS: If you need to add flags to an AWS CLI command to upload flags to your DESTINATION_BUCKET, this is where you enter them. This is typically only used if you are writing to a bucket that is not yours. If you don’t need to add UPLOAD_FLAGS, keep it as the default ‘’.

Example configs#

Read/Write to a collaborator’s bucket#

AWS_REGION = 'your-region'              # e.g. 'us-east-1'
AWS_PROFILE = 'role-permissions'        # A profile with the permissions setup described above
SSH_KEY_NAME = 'your-key-file.pem'      # Expected to be in ~/.ssh
AWS_BUCKET = 'bucket-name'              # Your bucket
SOURCE_BUCKET = 'collaborator-bucket'  
DESTINATION_BUCKET = 'collaborator-bucket'    
UPLOAD_FLAGS = '--acl bucket-owner-full-control --metadata-directive REPLACE'

Permissions setup#

If you are reading from a public bucket, no additional setup is necessary.

If you are reading from a non-public bucket or writing to a bucket, you wil need further permissions setup. Often, access to someone else’s AWS account is handled through a role that can be assumed. Learn more about AWS IAM roles here. Your collaborator will define the access limits of the role within their AWS IAM. You will also need to define role limits within your AWS IAM so that when you assume the role (giving you access to your collaborator’s resource), that role also has the appropriate permissions to run DCP.

In your AWS account#

In AWS IAM, for the role that has external bucket access, you will need to add all of the DCP permissions described in Step 0.

You will also need to edit the trust relationship for the role so that ECS and EC2 can assume the role. A template is as follows:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::123456789123:user/image_analyst",
                    "arn:aws:iam::123456789123:user/image_expert"
                ],
                "Service": [
                    "ecs-tasks.amazonaws.com",
                    "ec2.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

In your DCP instance#

DCP reads your AWS_PROFILE from your control node. Edit your AWS CLI configuration files for assuming that role in your control node as follows:

In ~/.aws/config, copy in the following text block at the bottom of the file, editing to your specifications, and save.

[profile access-collaborator] role_arn = arn:aws:iam::123456789123:role/access-to-other-bucket source_profile = my-account-profile region = us-east-1 output = json

In ~/.aws/credentials, copy in the following text block at the bottom of the file (filling in your access key info) and save.

[my-account-profile] aws_access_key_id = ACCESS_KEY aws_secret_access_key = SECRET_ACCESS_KEY