config.py configuration examples#

We have a handful of standard workflows we follow in a stereotyped fashion when running the Cell Painting Assay. We have listed below the standard way that we configure config.py for each workflow. You can read more information about the pipelines in the context of the Cell Painting Assay here.

  • Z-projection creates a new image with each pixel containing the maximum value from any of the z-planes, effectively condensing the contents of multiple focal planes into one. Generally, we perform projection on all images with multiple z-planes and downstream processing and analysis is performed on the projected images.

  • Illumination Correction is batched by plate and generates a function that corrects for light path irregularities as described here. Note that this pipeline depends upon having a large number of images. A standard pipeline can be found here.

  • Quality Control provides metrics on the quality of the input images. It is not a necessary step but can provide helpful information, particularly for improving wetlab workflows and for comparing across datasets. A standard pipeline can be found here.

  • Assay Dev/Segmentation is a quick pipeline that outputs segmentation outlines overlaid on a multichannel image rescaled for visual inspection. We often stitch the output into a pseudo-plate view as described here to confirm we have chosen segmentation parameters that work across our dataset. A standard pipeline can be found here.

  • Analysis is where illumination correction is applied, actual segmentation occurs, and all of the measurements used for generating image-based profiles are taken. Note that large images may require more memory than our default parameters listed below. If you don’t have enough memory, reduce the number of copies of CellProfiler running at one time by decreasing DOCKER_CORES. A standard pipeline can be found here.

Our internal configurations for each pipeline are as follows:

Z-Projection

Illumination Correction

Quality Control

Assay Dev

Analysis

Notes

APP_NAME

‘PROJECT_NAME_Zproj’

‘PROJECT_NAME_Illum’

‘PROJECT_NAME_QC’

‘ PROJECT_NAME_AssayDev’

‘PROJECT_NAME_Analysis’

If the PROJECT_NAME is excessively long you can enter a truncated version of it here but you will need to be careful to use the correct version in subsequent steps in the protocol. (e.g. 2021_06_08_WCPC_Zproj)

LOG_GROUP_NAME

APP_NAME

APP_NAME

APP_NAME

APP_NAME

APP_NAME

We never change this.

DOCKERHUB_TAG

‘cellprofiler/distributed-cellprofiler:2.2.0_4.2.8’

‘cellprofiler/distributed-cellprofiler:2.2.0_4.2.8’

‘cellprofiler/distributed-cellprofiler:2.2.0_4.2.8’

‘cellprofiler/distributed-cellprofiler:2.2.0_4.2.8’

‘cellprofiler/distributed-cellprofiler:2.2.0_4.2.8’

Ensure the CP tag number matches the version of CellProfiler for your pipeline (can easily see by opening the pipeline in a text editor and looking for the 3rd line “DateRevision: 413”).

AWS_REGION

‘us-east-1’

‘us-east-1’

‘us-east-1’

‘us-east-1’

‘us-east-1’

AWS_PROFILE

‘default’

‘default’

‘default’

‘default’

‘default’

SSH_KEY_NAME

‘YOURPEM.pem’

‘YOURPEM.pem’

‘YOURPEM.pem’

‘YOURPEM.pem’

‘YOURPEM.pem’

AWS_BUCKET

‘BUCKET’

‘BUCKET’

‘BUCKET’

‘BUCKET’

‘BUCKET’

Usually a bucket in the account that is running DCP.

SOURCE_BUCKET

‘BUCKET’

‘BUCKET’

‘BUCKET’

‘BUCKET’

‘BUCKET’

Can be a public bucket like cellpainting-gallery.

WORKSPACE_BUCKET

‘BUCKET’

‘BUCKET’

‘BUCKET’

‘BUCKET’

‘BUCKET’

If reading images from a public bucket, you might still want to read metadata from your bucket.

DESTINATION_BUCKET

‘BUCKET’

‘BUCKET’

‘BUCKET’

‘BUCKET’

‘BUCKET’

Usually a bucket in the account that is running DCP.

UPLOAD_FLAGS

‘’

‘’

‘’

‘’

‘’

ECS_CLUSTER

‘default’

‘default’

‘default’

‘default’

‘default’

Most of the time we all just use the default cluster but if there are multiple jobs being run at once you can create your own cluster by changing default to YOURNAME so that the correct dockers go on the correct machines.

CLUSTER_MACHINES

100-200

number of plates / CPUs and rounded up

25-100

25-100

100-200

AWS has limits on the number of machines you can request at a time. 200 is generally the largest we request for a single job to ensure there is some capacity for other users in the team.

TASKS_PER_MACHINE

1

1

1

1

1

MACHINE_TYPE

[‘c5.xlarge’]

[‘c5.xlarge’]

[‘c5.xlarge’]

[‘c5.xlarge’]

[‘c5.xlarge’]

Historically we have used m4.xlarge and then m5.xlarge however very recently we have been having a hard time getting m class machines so we have switched to c class. Note that they have different memory sizes so you need to make sure MEMORY is set correctly if changing between classes.

MACHINE_PRICE

.20

.20

.20

.20

.20

Will be different for different size/classes of machines.

EBS_VOL_SIZE (if using S3 mounted as a file system)

22

22

22

22

22

Files are read directly off of S3, mounted as a file system when DOWNLOAD_FILES = False.

EBS_VOL_SIZE (if downloading files)

22

200

22

22

40

Files are downloaded to the EBS volume when DOWNLOAD_FILES = True.

DOWNLOAD_FILES

‘False’

‘False’

‘False’

‘False’

‘False’

ASSIGN_IP

‘False’

‘False’

‘False’

‘False’

‘False’

DOCKER_CORES

4

4

4

4

3

If using c class machines and large images (2k + pixels) then you might need to reduce this number.

CPU_SHARES

DOCKER_CORES * 1024

DOCKER_CORES * 1024

DOCKER_CORES * 1024

DOCKER_CORES * 1024

DOCKER_CORES * 1024

We never change this.

MEMORY

7500

7500

7500

7500

7500

This must match your machine type. m class use 15000, c class use 7500.

SECONDS_TO_START

60

3*60

60

3*60

3*60

SQS_QUEUE_NAME

APP_NAME + ‘Queue’

APP_NAME + ‘Queue’

APP_NAME + ‘Queue’

APP_NAME + ‘Queue’

APP_NAME + ‘Queue’

We never change this.

SQS_MESSAGE_VISIBILITY

3*60

240*60

15*60

10*60

120*60

About how long you expect a job to take * 1.5 in seconds

SQS_DEAD_LETTER_QUEUE

‘YOURNAME_DEADMESSAGES’

‘YOURNAME_DEADMESSAGES’

‘YOURNAME_DEADMESSAGES’

‘YOURNAME_DEADMESSAGES’

‘YOURNAME_DEADMESSAGES’

JOB_RETRIES

3

3

3

3

3

AUTO_MONITOR

‘True’

‘True’

‘True’

‘True’

‘True’

Can be turned off if manually running Monitor.

CREATE_DASHBOARD

‘True’

‘True’

‘True’

‘True’

‘True’

CLEAN_DASHBOARD

‘True’

‘True’

‘True’

‘True’

‘True’

CHECK_IF_DONE_BOOL

‘False’

‘True’

‘True’

‘True’

‘True’

Can be turned off if wanting to overwrite old data.

EXPECTED_NUMBER_FILES

1 (can be anything, False above)

number channels + 1 (an .npy for each channel and isdone)

3 (Experiment.csv, Image.csv, and isdone)

1 (an image)

5 (Experiment, Image, Cells, Nuclei, and Cytoplasm .csvs)

Better to underestimate than overestimate.

MIN_FILE_SIZE_BYTES

1

1

1

1

1

Count files of any size.

NECESSARY_STRING

‘’

‘’

‘’

‘’

‘’

Not necessary for standard workflows.

ALWAYS_CONTINUE

‘False’

‘False’

‘False’

‘False’

‘False’

Use with caution.

USE_PLUGINS

‘False’

‘False’

‘False’

‘False’

‘False’

Not necessary for standard workflows.

UPDATE_PLUGINS

‘False’

‘False’

‘False’

‘False’

‘False’

Not necessary for standard workflows.

PLUGINS_COMMIT

‘’

‘’

‘’

‘’

‘’

Not necessary for standard workflows.

INSTALL_REQUIREMENTS

‘False’

‘False’

‘False’

‘False’

‘False’

Not necessary for standard workflows.

REQUIREMENTS_FILE

‘’

‘’

‘’

‘’

‘’

Not necessary for standard workflows.