01-config.Rmd

# (PART) Configuration {-} 

# Configure environment 

## Set up a virtual machine

This example assumes that AWS infrastructure has been set up using https://github.com/broadinstitute/aws-infrastructure-cellpainting

Launch an EC2 node using AMI `cytomining/images/hvm-ssd/cytominer-ubuntu-trusty-14.04-amd64-server-*`, created using https://github.com/broadinstitute/imaging-vms/blob/master/cytominer/cytominer_ami.json. You will need to create an AMI for your own infrastructure because the provisioning includes mounting S3 and EFS, which is account specific. See `module "ec2_login"` in the Terraform [configuration](https://github.com/broadinstitute/aws-infrastructure-cellpainting/blob/master/main.tf) for how to configure the security groups and IAM roles for this instance. The simplest approach is to launch another node identical to  `ec2_login`, which is set up in this infrastructure. We recommend using an `m4.xlarge` instance, with a 110Gb EBS volume.

After starting the instance, ensure that the S3 bucket is mount on `~/bucket`. If not, do `sudo mount -a`

Troubleshooting: Note that given this configuration in this AMI, EFS can only be mounted from us-east-1a or 1b. This can be changed by appropriately editing the EFS configuration via Terraform.

Log in to the EC2 instance 

Check available space on the instance

```sh
du -h
```

Ensure that the `Available` column is at least 30Gb x `p`, where `p` is the number of plates you will process in parallel when creating the database backend. 
We recommend `p` to be one less than the number of cores (`p` = 3 for `m4.xlarge`, so 60Gb should be available)

Enter your AWS credentials

```sh
aws configure
```

The infrastructure is configure with one S3 bucket. Mount this S3 bucket (if it is not automatically mounted)

```sh
sudo mount -a
```

Check that the bucket was was mounted. This path should exist:

```sh
ls ~/bucket/projects
```


## Define variables 


```sh
PROJECT_NAME=2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad

BATCH_ID=2016_04_01_a549_48hr_batch1

BUCKET=imaging-platform

MAXPROCS=3 # m4.xlarge has 4 cores; keep 1 free
```

## Create directories

*Troublshooting tip:* See note above about EFS - that it can only be mounted from us-east-1a or 1b. 

```sh
mkdir -p ~/efs/${PROJECT_NAME}/workspace/

cd ~/efs/${PROJECT_NAME}/workspace/

mkdir -p log/${BATCH_ID}

```


Create a temp directory which is required when creating the database backed using `cytominer-database` (discussed later).
This is also useful if you decide to run CellProfiler directly on this node – running the Cell Painting analysis spipeline results in large temporary files.

```sh
mkdir ~/ebs_tmp
```

*Troublshooting tip:* If at this point you realize that the ec2 instance doesn't have enough space (which you can check using `du -h`), 
create and attach an EBS volume, and then mount it. 

```sh
# check the name of the disk
lsblk

#> NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
#> xvda    202:0    0     8G  0 disk
#> └─xvda1 202:1    0     8G  0 part /
#> xvdf    202:80   0   100G  0 disk

# check if it has a file system
sudo file -s /dev/xvdf
# ...likely not, in which case you get:
#> /dev/xvdf: data

# if no file system, then create it
sudo mkfs -t ext4 /dev/xvdf

# mount it
sudo mount /dev/xvdf /home/ubuntu/ebs_tmp/

# change perm
sudo chmod 777 ~/ebs_tmp/
```