forked from cytomining/profiling-handbook
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path01-config.Rmd
105 lines (65 loc) · 3.29 KB
/
01-config.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
# (PART) Configuration {-}
# Configure environment
## Set up a virtual machine
This example assumes that AWS infrastructure has been set up using https://github.com/broadinstitute/aws-infrastructure-cellpainting
Launch an EC2 node using AMI `cytomining/images/hvm-ssd/cytominer-ubuntu-trusty-14.04-amd64-server-*`, created using https://github.com/broadinstitute/imaging-vms/blob/master/cytominer/cytominer_ami.json. You will need to create an AMI for your own infrastructure because the provisioning includes mounting S3 and EFS, which is account specific. See `module "ec2_login"` in the Terraform [configuration](https://github.com/broadinstitute/aws-infrastructure-cellpainting/blob/master/main.tf) for how to configure the security groups and IAM roles for this instance. The simplest approach is to launch another node identical to `ec2_login`, which is set up in this infrastructure. We recommend using an `m4.xlarge` instance, with a 110Gb EBS volume.
After starting the instance, ensure that the S3 bucket is mount on `~/bucket`. If not, do `sudo mount -a`
Troubleshooting: Note that given this configuration in this AMI, EFS can only be mounted from us-east-1a or 1b. This can be changed by appropriately editing the EFS configuration via Terraform.
Log in to the EC2 instance
Check available space on the instance
```sh
du -h
```
Ensure that the `Available` column is at least 30Gb x `p`, where `p` is the number of plates you will process in parallel when creating the database backend.
We recommend `p` to be one less than the number of cores (`p` = 3 for `m4.xlarge`, so 60Gb should be available)
Enter your AWS credentials
```sh
aws configure
```
The infrastructure is configure with one S3 bucket. Mount this S3 bucket (if it is not automatically mounted)
```sh
sudo mount -a
```
Check that the bucket was was mounted. This path should exist:
```sh
ls ~/bucket/projects
```
## Define variables
```sh
PROJECT_NAME=2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad
BATCH_ID=2016_04_01_a549_48hr_batch1
BUCKET=imaging-platform
MAXPROCS=3 # m4.xlarge has 4 cores; keep 1 free
```
## Create directories
*Troublshooting tip:* See note above about EFS - that it can only be mounted from us-east-1a or 1b.
```sh
mkdir -p ~/efs/${PROJECT_NAME}/workspace/
cd ~/efs/${PROJECT_NAME}/workspace/
mkdir -p log/${BATCH_ID}
```
Create a temp directory which is required when creating the database backed using `cytominer-database` (discussed later).
This is also useful if you decide to run CellProfiler directly on this node – running the Cell Painting analysis spipeline results in large temporary files.
```sh
mkdir ~/ebs_tmp
```
*Troublshooting tip:* If at this point you realize that the ec2 instance doesn't have enough space (which you can check using `du -h`),
create and attach an EBS volume, and then mount it.
```sh
# check the name of the disk
lsblk
#> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
#> xvda 202:0 0 8G 0 disk
#> └─xvda1 202:1 0 8G 0 part /
#> xvdf 202:80 0 100G 0 disk
# check if it has a file system
sudo file -s /dev/xvdf
# ...likely not, in which case you get:
#> /dev/xvdf: data
# if no file system, then create it
sudo mkfs -t ext4 /dev/xvdf
# mount it
sudo mount /dev/xvdf /home/ubuntu/ebs_tmp/
# change perm
sudo chmod 777 ~/ebs_tmp/
```